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INTRODUCING VOLUME TWENTYFIVE 


1. The first number of Sankhya: The Indian Journal of Statistics was 
published exactly thirty years, or one generation ago, in June 1933. In starting the 
twentyfifth volume I have much pleasure in announcing that Dr. C. Radhakrishna 
Rao would be associated with me as editor in future. The pleasure is all the greater 
because he is my former pupil; he attended my lectures as a student in the post- 
graduate course in statistics which was started for the first time in India in the 
University of Calcutta in 1941. After taking his M.A. degree in statistics, he joined 
the Indian Statistical Institute in 1943, and since then has been my colleague for 
twenty years. He has been, in actual fact, in charge of most of the editorial work 
for many years. I look forward to the day when he would assume full responsibility. 

2. Having attained on this day the age of seventy, I may take the oppor- 
tunity of looking both back and forward, a little. My mind goes back to 1933 when 
we were busy sending articles to the press for the first number of Sankhya. I had 
started working on the multivariate distance several years earlier, and had examined 
a portion of a large volume of individual measurements of various anthropometric 
characters for a large number of castes and tribes in North India which had been 
published in 1891 by H. H. Risley. The measurements had been taken by the 
same small group of observers; and were, therefore, suitable for purposes of compari- 
sons between castes and tribes. Karl Pearson had, however, condemned the material 
much earlier because of the discrepancies he had found in the average values given 
by Risley. After detailed scrutiny, I reached the conclusion that most of the dis- 
crepancies in individual measurements and indices could be traced to easily recognis- 
able copying or printing mistakes, use of wrong figures taken from adjoining rows 
or columns, mistakes in entering index tables, or obvious arithmetical slips like a 
displacement of a decimal point in calculations. Out of 142 discrepancies in indivi- 
dual values, 133 could be corrected and corroborated with practical certainty by cross- 
checks with appropriate index numbers; in eight cases, the corrections were plausible 
although they could not be confirmed, while only one single measurement was really 
doubtful and had to be rejected out of a total of 12,197 individual measurements 
and a total of 8,600 indices given by Risley. The real defect in Risley’s data had 
occurred during the calculation of average values; the primary data of individual 
measurements given by him could be used with safety, especially after applying the 
corrections which I had used. ; 

3. I thought it would be useful to publish the revised values of Risley’s 


data together with the detailed evidence in support of the corrections made by me 
1 
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My young colleagues, who were only three or four at that time, were ered oppend 
to the publication of this paper. The scrutiny and reconciliation o chee : 
which I had carried out, they felt, could not be considered to be scientific E a 
all; they were eager to prevent the ‘Professor’ from exposing himself to ay e in 
advanced countries by publishing this paper. However, due to a streak p con- 
trariness and obstinacy, I printed this article in the first issue of the journal. 

4. It can be easily imagined with what joy and encoùragement I read the 
following lines in a letter dated 14 August 1933 from Ronald Aylmer Fisher :— 

“You are most heartily to be congratulated on the new Journal, and very 
especially on your own contributions. The work on Risley’s data will be 
most valuable. I shall hope to hear and read more of your contributions 
as time goes on. It is a splendid start.” 
It was characteristic of the most eminent statistician of the present age to have 
selected the paper on Risley’s data for special mention. A 

5. Fisher himself had said somewhere that the first responsibility of a 
statistician is to cross-examine his data. I remember the vivid description he gave 
me, during his first visit to the Indian Statistical Institute in 1937, about his investiga- 
tions in the monastry in Austria where Gregor Mendel had carried out his experi- 
ments on the inheritance of characters in sweet peas. Mendel had announced in 
his last scientific publication that he would publish a in another paper his results on 
three factor segregation, but did not do so. Fisher had an almost irresistible urge 
to find out why Mendel ceased publication. Searching through old records, Fisher 
traced the original observations which Mendel hadintended to use for his unpublished 
paper, and found that there was perfect agreement between observed and me 
results. Fisher surmised that such agreement had raised a suspicion in Mendel’s 
mind that his assistant, who had been helping him in these experiments, had deli- 
berately changed the records to make them agree with expectations; Mendel had 
refrained from publishing the results as he could not guarantee their accuracy.” 

6. My mind also goes back to the day when I had the good fortune to estab- 
lish contact with R. A. Fisher. In 1923 I was working as Meteorologist in Calcutta 
in addition to teaching physics in the Calcutta University. Fisher was engaged 
in his researches on the design of experiments at Rothamsted Experimental Station, 
Harpenden. I had no connexion with agricultural research. By sheer chance, my 
attention was drawn to the question of “errors” in some agricultural field experi- 
ments, (in the form of a series of parallel plots sown with different varieties of rice, 
repeated in the same order in several blocks). I tried to eliminate, by crude gradua- 
tion, differences in soil fertility and published a paper in an agricultural journal.3 
Fisher saw this paper and immediately sent me reprints of his early papers on the 
design of experiments and also the paper on the distribution of the ratio of two 


variances. 

7. While struggling with the analysis of variety trials on paddy, I had 
begun to appreciate the need of radical improvements in agricultural field experi- 
ments. Wh 


en I read Fisher’s papers on this subject. I re 
the problem at a theoretical level but had al 
1A Revision of Risley 
Sankhya, 1, 1933, 76—105. 
2 R. A. Fisher: “Has Mendel’ 
3 Probable error of field experi 


alized that he had not 
so supplied the basic tables 


only solyed 


*s Anthropometric Data Relating to the Tribes and ‘Castes of Bengal. 


8 work been rediscovered 7” Annals of Science 1, 115—137, 1936. 
ments in agriculture, Agri. Jour. Ind. 20, 1925, 96. 


5) 
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(for the z-distribution) to facilitate the use of his methods almost in a routine manner. 
I could also appreciate how great was his achievement. I believe, I can claim to 
be the first convert to the Fisherian view of statistics; I have also tried to extend 
his ideas to the design of sample surveys. For me, the discovery of Fisher, nearly 
forty years ago, was an important factor in deepening my interest in statistics which 
was further strengthened by the impressions of the memorable day I spent with 
him at Rothamsted Agricultural Station in-1926 when I met him for the first time. 

8. I also recall that it was at Fisher’s suggestion (as I came to know much 
later) that the newly established Imperial (now Indian) Council of Agricultural 
Research offered me in 1928 an annual grant of Rs. 2,500 (about £ 200, a princely 
sum for us in those days) to have a research assistant to take up some work in statistics. 
This grant led the way to the future development of the integrated programme of 
theoretical research, training, and applied projects which has been a characteristic 
feature of the Indian Statistical Institute. 

9. Fisher came to our Institute on eight occasions. He always stayed in 
our house in the Institute in Calcutta. This gave me and my young colleagues the 
opportunity to profit by his stimulating discussions and suggestions. The special 
needs of an underdeveloped country like India, had made it continually necessary 
for us to increase the scope of application of statistical methods in widely differing 
subject fields in natural and social sciences, technology and economic planning. . In 
such developments we received powerful support from R. A. Fisher, who quite early 
had a clear view of statistics as the new technology of the modern age. He also first 
formulated, in a precise way, the concept of the Indian Statistical Institute as 
higher technological institution having an analogous function in respect of statistics, 
although on a much smaller scale, to that of the higher technological institutions 
like the Zurich Federal School of Technology or the Massachusetts Institute of 
Technology, which had been established a hundred years ago to provide an integrated 
programme of research, training and projects in the field of engineering and techno- 
logy. In all these ways, Ronald Fisher had exercised more influence than any one 
else in the shaping of the policy and programme of the Indian Statistical Institute 
of which Sankhyd is the official organ. 

10. R. A. Fisher had said somewhere that he had learnt his statistics through 
computation, I presume, in the dual sense that no theoretical formulae are of any 
value unless these can be used in numerical terms at a concrete level, and also thta 
statistician have to do the ‘dirty work’ of computation with their own hands. On 
the occasion of his first visit to the Institute in December 1937, he requested me to 
give him a hand calculating machine. For his seven subsequent visits, a desk cal- 
culator always used to be placed in his room in advance; he used such a calculator 
every day during his last stay in Calcutta up to the middle of February 1962, 

11. I am recalling the two points made by him, namely, the need of cross- 
examining the data and the importance of computational work in statistics, because 
both these points arë of crucial significance in the future development of statistics 
in India. During the last thirty years or the first generation of this journal, there 
has been a good deal of advance in the field of statistics in India. Indian statisticians 
have come to enjoy a good reputation abroad. Paradoxically, the Indian statistical 
system in respect of flow of factual information is recognized, within and outside 
the country, to be very weak. This is due to the fact that collection, processing and 


4 Reference to R. A, Fisher’s address at the first convocation in 1962; 
1951, quoted by me in an annual review. 
3 


also to Fisher’s Speech in 
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publication of official statistics are still treated as administrative na: pi 
to the principle of a monopolistic jurisdiction of one single alrninigërative — 
for each type of information. The acceptability of statistical data is Ligon 
by the status of the authority responsible for their collection or publication, usually 
without any assessment of the reliability of the information. a o 

12. The only way to improve the quality of official statistics in Tiei Š 
by testing their accuracy in accordance with accepted scientific principles, by p 
checks and cross-checks provided through multiple observations or through ind e 
pendent sources of information. Sample surveys, with multiple and parallel pë 
inter-penetrating network of sub-samples, provide a speedy and economical way bi 
ascertaining the margin of uncertainty in an objective manner. It has bites te 
fore, the policy of this journal to attach special importance to methods and = ioa 
applications of sample surveys. The need of doing this would continuo in future. 

13. In the second generation, if I may call it so, of the journal, we To 
of its important tasks must be to foster the growth of the spirit of criticism withou 
which the advancement of science is not possible. In the early years, we used ip 
publish book reviews and also selective reviews and abstracts of statistical papers 
published elsewhere; much to our regret, we have not been able to maintain paei 
two features. We feel it may be particularly helpful to use reviews of articles mg 
books, in a purposeful manner, to promote the growth of a critical panera di 
the quality of research and of advances in the collection, processing and analysis o 
statistical information in India. In future, it would be our policy to welcome reviews 
both of papers published in journals and books. r pin, 

14. From the very beginning, to help in the advancement of statistics in an 
underdeveloped country like India, we had adopted certain lines of policy which 
are somewhat unorthodox, for example, reprinting papers published elsewhere to 
make them more easily available in India, publishing statistical materials and data 
for purposes of record, or giving attention to primary work like scrutiny of data and 
computations. It is our intention to continue these features. At the same time, 
we shall continue to welcome contributions from all over the world, as we have been 
doing from the first issue of the journal. 

15. I should like to take this opportunity of placing on record my sense 
of gratitude to the late Narendra Nath Mookerjee, who with characteristic generosity 
had helped in the publication of the journal from the Art Press, in the early years. 


I remember the services rendered by my two young colleagues, Subhendu Sekhar 
Bose and Sudhir Kumar Banerj 


years, we are grateful for the help we have received from Anikendra Mahalanobis, 
Krishna Birendra Goswami and Dyutish Ranjan Banerjee in the publication of this 
journal. workers of the Eka Press, who have 
always helped ungrudgingly in printing the journal. 

adventure, even foolhardy, to have started a statistical 
years ago when our resources in research and in material 
It is because of the friendly cooperation and support which 


om within and outside India that this journal could make 
lic esteem. We offer our mos 


29 June, 1963 KAR E P met P. O. Mahalanobis 


LARGE SAMPLE SEQUENTIAL TESTS FOR 
COMPOSITE HYPOTHESES 


By D. R. COX 
Birkbeci: College, University of London 


SUMMARY. A sequential test for hypotheses involving nuisance parameters is developed frem 
maximum likelihood (m.l.) theory. The procedure is a slight modification of one outlined by 
Bartlett (1946a). Special cases are discussed. 


1. INTRODUCTION 


Except for a short section in a paper by Bartlett (1946a), previous work on 
sequential tests when there are nuisance parameters has been restricted to very special 
problems. Most work has concerned situations where there are simple sufficient 
statistics and where invariance ideas are applicable to the estimation of the nuisance 
parameters. Barnard (1946, 1952), Wald (1947, p. 80) and Cox (1952) have explored 
this from rather different points of view. In appropriate cases, a sequential test 
can be formed by finding at each step the relevant standard fixed-sample size statistic, 
and computing the ratio of the densities of this under the two base hypotheses H, 
and Hy, This ratio is then used with the usual stopping limits (1—)/a and Bi(1—«), 
where æ and f are the probabilities of error under H, and Hy. The most important 
example of this procedure is the sequential £ test (Rushton, 1950). 


Other results include that of Girshick (1946), who gave a procedure for com- 
paring two populations. However, his procedure often leads to an operating charac- 
teristic depending upon an undesirable combination of population parameters. For 
example, the test for comparing two normal variances oł and oF has an operating 
characteristic depending on ljof— ljo5, and this would not usually be what is wanted, 
Another special procedure is for comparing two binomial probabilities (Wald, 1947, 
p. 106). This will be discussed in Section 4. 


In the present paper, a slight modification is given of Bartlett's procedure, 
which is based on maximum likelihood (m.1.) theory. A number of TE ngj stë 
then examined. 


We consider, for simplicity, tests corresponding to the Wald likelihood ratio 
test for comparing two simple hypotheses. Extensions to the comparison of three 
hypotheses and to the important closed schemes of Armitage (1957) can be made with- 
out difficulty. 


or 
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2. TEST BASFD ON MAXIMUM LIKELIHOOD ESTIMATES 


Let the distribution of the observations be known except for the unknown 
parameters (9,¢). Let the two base hypotheses be H; : 0 = 0,(i = 1, 2), the quantity 
@ being a nuisance parameter. The dimensionality of 0 and ¢ does not matter: we 
shall write out formulae as if both are one-dimensional. 


After n observations, let L,(x,,9,) be the log likelihood. If ¢ is known, 

and equal to ¢ say, the test is based on 
L,(r,,0,, Po)— Lul, a, Po): eee (2.1) 
using A = log {U—A)/2} and B = log {B/(1—a)} as stopping limits. We deal here 
with large sample theory in which the log likelihood is expanded as far as quadratic 
terms. In a rigorous treatment we would consider a sequence of schemes depending 
on a parameter N in such a way that, as N> co, the relevant sample sizes are 
proportional to N, and 6,, 0, and the true value 9 differ by amounts of order 1/4/N. 


That is, in expanding (2.1) about the true value 0, 4,—0 and 0,—0 are to be treated 
as of order 1/4/n. Thus we have that for i = 1,2, 


TEn Ois bo) = Lalm E E 400, 9)2 Theos dd), 


where the last two terms are of order 1 in probability. Thus (2.1) becomes 


2 
(0,—0.) IL, (Lp, 0, Po) Ha(0,— 0, )(04-4-0, —20) Patën, 0, Bo) ; “i (2.2) 
30 ä ag? 

: Now suppose that $ is unknown, and that no prior probability distribution 
of ¢ is available. Let (Ô, $a) be the maximum likelihood estimate of (0, $) based on 
En The method of Bartlett (1946) involved the use of two maximum likelihood 
estimates of g, one assuming 0 = 6, the other that 0 = 09. This is, however, 
unnecessary to the order of approximation being considered here. 

It is plausible to consider instead of (2.1) 
Lale, 0z, #) —L,(x,, 0, ĝ). vës (2.3) 
ue point (0, $), we get 


OL, (x, 0, p) 
(02—01) Kan: U, p) 
2 1) 20 


Expanding (2.3) about the tr 


+4(01—0,N0p+-0,—20) WF n(n 8:9) 


Hd) “Plans 0,9) 


g ni (2.4) 
Thus the test based ; i E 
suid onlay ed on (2.3) is asymptotically equivalent to that when is known if 
i L(x, 6, $) 
n- d0ap 9 së (2.5) 


6 


i E E ee O E 


LARGE SAMPLE SEQUENTIAL TESTS FOR COMPOSITE HYPOTHESES 


in probability. This is the condition that 8 and Ê are asymptotically independent. 
Therefore, when (2.5) is satisfied, we can replace $ by $ and apply the usual formulae 
for simple hypotheses. An asymptotic optimum property will hold corresponding 
to the exact optimum property when ¢ is known. The practical relevance of this 
optimum property may, however, be slight unless both 6 is likely to be near to 6, or 
05, and also the cost of sampling is proportional to the number of observations. 

Tf (2.5) is not satisfied, the last term in (2.4) is comparable with the others, 
and to neglect the sampling fluctuations in $ is wrong. To simplify (2.4), we make 
the usual substitutions of maximum likelihood theory writing 


PL, (Xn, 0, Q) PL, (ns 9,9) PLE 0, 9) 
= su ~ nI — agë nI 36, “9098 —anleg, ... (2.6) 


where the J’s are known or can be estimated consistently. In particular, if the obser- 
vations are independently and identically distributed with density function f(x: 0, $), 
we have the usual formulae, such as 
0” log f(x, $) 
= — sie es pës N27, 
to E { 202 } me 
The maximum likelihood estimates 9, ĝ satisfy asymptotically the equations 
A 1 ILE 6, P 
Io (0-6)-+ Log (8—9) = Falen 8), 


Tog (—0)--Igg (—9) =+ Pd 6) s <.. (2-8) 


We can now express (2.4) in terms of 6, g. In fact (2.4) is asymptotically equivalent 
to nipa (0,— 0,8 —14(0,+0,)}, suggesting that the test should be based on the compu- 
tation of 


Tn = MO—O--09). ve (2.9) 
Now ET,) — n{0— 40,109), var (7',) = nI, 

1 T 
where mm” lo JË. “i (2.10) 


Further 7', can, from (2.8), be expressed as a linear combination with constant coeffi- 


cients of 


ols 9.9) and Tle 0,8) mi 
If, for example, the observations are independent and identically distributed, the 
quantities (2.11) are sums of independent and identically distributed random variables 
and hence, asymptotically, so is Ty. The same is true much more generally. That is, 
the stochastic process {7',} is a random walk in which the mean increment per step 
is 0—0,--64) and the variance per step is 199. This has in general to be estimated 
from (6, 4), but this introduces negligible error to the order being considered. Riter- 
natively I% could be estimated by substituting 0 = 4(0,+-0,), 6 = ĝ. Wot ete 
fore use the theory for normally distributed observations (Wald, 1947, p. 122). 
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If we set stopping limits at values of 7, corresponding to constant values 
of the estimated difference of log likelihoods (2.3), the limits should be at 


E on 5 © 
a log ( £ j: Ki log ( “Ë ), sy (2.12) 


l—ax 


the operating characteristic is, in Wald’s notation, 


TE Në 
a l—a 
where ki a (2.14) 
Also, the expected sample size is (0 4 30,--40,) 
70 L0) log (e) + {1—1(0)} log ( it )] 
0—00 10 F0} ' e (220) 


Note that, as pointed out by Bartlett (1946b), the ratio of mean sample sizes 
when ¢ is (a) known and (b) unknown is the same as the corresponding ratio of 
asymptotic variances in ordinary maximum likelihood estimation. 


There are many procedures asymptotically equivalent to the one sketched 
above : 


(a) instead of the maximum likelihood estim 


ate Owe can use in (2.9) an 
asymptotically equivalent estimate; 


(b) for the consistent estimation of 109 a variety of methods will often be 
available; 


(c) if the parameter 0 is transformed by 


a monotone transformation to 
(9), the procedure can be operated in terms of 


n{g(8)—2{9(0,)-++-9(0,)}]. 


We shall not consider here (a) and (b) further, As for (c), it seems reasonable to choose 
0 so that the likelihood function L,(x,, 0, P) is as far as possible quadratic with 
parameters not sharply dependent on (6, $), that is to try to make 0 as nearly as possible 
normally distributed with constant variance. 


For example, Suppose that pairs of observati 
lations of standard deviations Ti, Oz, 
a sequential test concerning Tij. 
The maximum likelihood estimate 0 after n pairs of observations, Fisher’s 
normally distributed with variance j 
procedure based on (2.9) 
variance ratio distributi 


ons are taken from norm. 
it being required to make, by the present 
It would be reasonable to take 0 = 


al popu- 
method, 
log (7,175). 
z, is nearly 
PS, =1. The 


on (Cox, 1952), 
characteristic and expected sample 
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3. TESTS FOR THE MEAN OF A NORMAL DISTRIBUTION 
Consider the sampling of a single normal population of unknown mean ie 
and variance 0”. If we are interested in the mean of the population, there are two 
main formulations, corresponding to defining the parameters 0, 6 by 
(i) 0-uo, =o; 
ji) 0 zg zo. 
The first is the form adopted in vvork on the sequential t test, and the hypotheses 
0 = 0,,0, are, effectively, hypotheses about the probability of obtaining a positive 
observation. The second formulation would be appropriate if certain absolute changes 
in the mean are of interest, irrespective of the value of ø. This might be the case in 
measurements on an extensive property. 
The sequential procedure in case (ii) is obtained by using the usual estimate 
s to replace o in the procedure with o known (Wald, 1947, p. 122). Since s and the 
sample mean are independent, there is asymptotically no change in the properties of 
the scheme from those when g is known. 
In case (i), the maximum likelihood estimator Ê is, in the usual notation z/s, 
and it follows from first principles or from the general theory, that asymptotically 


00 2 
var (6) = “ = = (1-+402). 8) 


'Thus the procedure defined by (2.9) and (2.12) is to use the statistic 


një —10,+04)} se (8:2) 

with stopping limits at ` i 
144 oe Pæ A 
QË Muett ve (9) = 63) 


m san sample size is, from (2.15) and (3.1), (1--102) times what it would have 
b 3 pi Pa ‘ce The test based on (3.2) and (3.3) is asymptotically equivalent 
ea hi close to, the test in the form given by Rushton (1950) and the 


P ie AS 905 jë 
National Bureau of Standards’ tables (1951) 
4. COMPARISON OF BINOMIAL PROPORTIONS 


, of the most important sequential problems is the comparison of two 
ne Denote the two possible outcomes of each kind by 0 and 1. 
bservations are paired, but this is not essential for the tests 
are four possible results for a pair of observations, 


binomial proportions. 
We suppose here that the 0 
to be developed below. There 
(i) Lo, OJ, (ii) [0,1], ii) LL, Ob (iv) LL 1]. a | | 

Wald (1947, p. 106) obtained a test by tejee në types (i) and (iv) and examin- 
ing the proportion of type (iii) among the remaining agone There are two condi- 
afore for this to be completely appropriate (Cox, 1958). First, suppose that in the 


9 


SANKHYA : THE INDIAN JOURNAL OF STATISTICS : SERIES A 


i-th pair the odds of a 1 are respectively ø; and 4; in the two populations, where 
the ¢;’s are nuisance parameters, one for each pair. Secondly, suppose that 0, the 
odds ratio, is the sole parameter of interest. That is, the probability of a 1 in the i-th 
pair is g,)(1--g,) in the first group and 4¢,/(1+-4¢,) in the second group. Note that even 
if this probability model holds, Ø is not necessarily the parameter of interest. For 
example, there may be economic reasons for being concerned with the difference 
between the overall proportions of l's in the two groups. 

If, however, we observe two independent series of Bernoulli trials, Wald’s 
procedure is not completely efficient (Barnard, 1946; Wald, 1947, p. 108). For it is 
not based on the sufficient statistics for the problem. 
trials are independent and that the probabilities of a 1 
lations. We can formulate v 
be applied and sh 
difference of prob 


From now on, suppose that all 
are A, and Ay in the two popu- 
arious hypotheses to which the theory of Section 2 can 
all here consider two, corresponding to situations in which (i) the 
abilities and (ii) the ratio of odds are the parameters of interest. 
For (i) the natural parametrization is to take 


A=4, Ag = O+0. ve (4.1) 


For (ii) it is perhaps better to take 0 as the difference of log odds rather than the ratio 


of odds (see the final remarks of Section 2). That is, we write 


= P — ge! 2 

Aj ng” As Sir gar “e (4.2) 

In both (4.1) and (4.2) the base hypotheses are 0 — 01, O3. Often 0) —0, correspond- 
ing to a null hypothesis Ay = Ay 


We can now apply the results of Section 


2 in a routine way. 
observations, let there be 0 


After n pairs of 
“> Vs in the two groups. 


The test statistics (2.9) 


are 
case (i) ni —W—1n(0, 4-09): (4.3) 
A ; (2) (1) 
case (ii y Th {n— n } Ja 
(ii) n log I Kn | — n0, +0,), ve (4.4) 
and the stopping limits are given by (2.12) with 700 equal to 
ve Mb (Ça 
Sete} “ (8) 
case (ii) ee st Te 
Pin —rr als Din 7D) : “e (4.6) 
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To illustrate these results, consider the expected numbers of pairs to reach 
a decision, in comparable situations, of three tests : 

(A) Vald's test for the odds ratio; 

(B) the maximum likelihood test (4.3) for the difference of probabilities; 

(C) the maximum likelihood test (4.4) for the difference of log odds. 
Two situations will be considered, one in which the proportion of pairs contributing 
to (A) is near its maximum of $, the other in which the proportion is much 
The details are 


lower. 


I (a) A SA 50.5 versus (b) A, = 0.5, Ag — 0.6: 
I (a) Ay=A,=0.1 versus (b) A, = 0.1, A, = 0.2. 
For test (A) the base hypotheses are of odds ratios I : 1 versus 1.5: II : 1 versus 2.25. 
For test (B) the base hypotheses are of differences of 0 and 0.1, in both cases. For 
test (C) the base hypotheses are of differences I: 0 versus log 1.5; II : 0 versus 
log 2.25. 
The comparison of (A), (B) and (C) does not depend on the particular æ and 
2 chosen, but for definiteness we take x = # = 0.05. Note that if all sample propor- 
tions are fairly near 4, expressions (4.5) and (4.6) are near to 4 and 8 respectively. 
In both cases the estimate of ¢ is correlated with 6, so that, for given q, J, 
there is an increase in mean sample size as compared with the situation in which $ 
is known. 
TABLE. MEAN NUMBER OF PAIRS REQUIRED IN SEQUENTIAL 


TESTS (A), (B), (C) UNDER HYPOTHESES I(a) Mi = à: = 0.5, 
(b) 21 = 0.5, Aa = 0.6 ; I(a) a = Az = 0.1, (b) M = 0.1, àg = 0.2 


asymptotic mean sample size 


test case I . casa il 

hyp. (a) hyp. (b) hyp. (a) hyp. (b) 
(A) Wald : proportion of 280 203 184 184 
“effective ” pairs 0.5 0.5 0.18 0.26 


ifference 
(B) m.l. on di ise ey 3 
of probabilities 265 260 95 132 


(C) mil. on log = S : 
odds ratio 258 263 179 140 


More extensive calculations are desirable. The general conclusion seems to be 
that when the proportion of “effective” pairs in Wald’s test is not too small, then 
there is little to choose between the tests, but that, at least in some cases, an 


appre- 
ciable reduction in mean sample size can be obtained from the test (4.3). 


In inter- 
preting the Table, recall that the results are asymptotic so that it is unlikely that 


the units figure has much meaning. 


ll 
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One further point concerns the increase, if any, in sample size due to the ingg 
duction of a nuisance parameter. This can be assessed by comparing the variances 
of the maximum likelihood estimators of the relevant parameters. Thus, suppose 
we compared the hypotheses À, = 0.5, Ag = 0.6 and wanted an operating — 
istic depending only on Ay. Then only single observations, not pairs, need be made. 
Further, the variance of A, is approximately {2\/n}—. But in test (B) the varnve of 
0 after n pairs is approximately {V(2n)}-, because the difference of two proportions 
is taken. Thus in effect there is a fourfold increase in sample size due to the intro- 
duction of the nuisance parameter, On the other hand, suppose that A, = $—4, 
Ag = $+) and ġ = 4. Then ¢, 6 are uncorrelated and the mean sample size is the 
same whether or not g is regarded as known. 


5. Discussion 

Two main extensions of the results of this paper are worth considering. 
One is to processes observed in continuous time. 
parison of survivor curves (Armitage, 1959), using maximum likelihood estimators 
recomputed continuously in time, would require such an extension. The second 
development, which is likely to be much more difficult to accomplish, is to obtain more 
refined approximations to the properties of the tests. This would involve consider- 
ing the stochastic process of maximum likelihood 
possible sample sizes and representing this by 
of means of independent random variables. 


For example, the sequential com- 


estimators corresponding to all 
something more refined than a sequence 
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ON ASYMPTOTIC EXPANSIONS FOR SUMS OF INDEPENDENT 
RANDOM VARIABLES WITH A LIMITING 
STABLE DISTRIBUTION 


By HARALD CRAMER 
University of Stockholm 

SUMMARY. The sum of n independent and identically distributed random variables will, under 
certain known conditions and after appropriate normalization, converge in distribution. The limiting 
distribution will then belong to the class of stable distributions. In the particular case when the limiting 
distribution is normal it is well known that, under certain additional conditions, the distribution function 
F n(x) of the normalized sum admits an asymptotic expansion in powers of 7-1/2, valid as n tends to infinity. 
The present paper considers the analogous question for the case of a non-normal stable limiting distribution. 
Sufficient conditions for the validity of an asymptotic expansion for F(x) in this case are given, and explicit 
expressions for the terms of the expansion are deduced. As in the normal case, the terms are functions 
of x, multiplied with certain negative powers of n. 


1. INTRODUCTION 


Let ti të, ..., be independent and identically distributed random variables, 
with a given common distribution function (d.f.) F(x), and the characteristic fune- 


tion (c.£.) 
f(t) =f et dF). 


Tf it is possible to find constants B, > 0 and A, such that the d.f. of the normalized 


sum 
+X. “e Hin An 
i B 


tends to a limiting d.f. G(x) as n tends to infinity, we say that F(z) belongs to the 


domain of attraction of G(x). , 
Denoting by Fy(e) and f,(t) the df. and c.f. of the random variable (1.1) we 
then have, for any continuity point x of G(x), and for all real £ 
F,(x) = F” (A,+B,x)— G(x), 


(1.1) 


n 


fat = | £5) Ya, - QY 


n 
where Fu” (w) denotes the n times repeated convolution of F(x) with itself, while aft) 
is the c.f. corresponding to the d.f. G(x). 
It is well known that, in order that G(x) should possess a domain of attrac- 
s sense, it is necessary and sufficient that G(x) ‘should be the df. of a stable 
The cf. g(t) is then given by the expression 
log g(t) = hit—c) tje --riott, «)], e KË) 
13 
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where 2 is any real constant, while ¢, z and Hare 


—IKAK I. Finally, 


á -> 
constants such that e > 0,0 <a < 2, 
m 
sgn 1.92 when a Æ 1, 


shaje us (14) 


2 1 
| sgn £. “logjti when æ = 1. 
7 


The case ¢ = 0 is trivial, and will be excluded from our considerations, so that we 
may throughout assume c > 0. 

When the limiting relations (1.2) hold, we can evidently 
priate modification of 4, and B, obtain a limiting distribution such that, in the 
expression (1.3) for the c.f., we have h = 0 and c = T. 


ahvays, by an appro- 


The parameter œ is called the characteristic exponent of the stable 


distribution. 
For æ = 2 the distribution is normal: for 0 <% <2 ve have non-normal stable 
distributions. 


- In order that F(x) should belong to the domain of attraction of a non-normal 


stable law with the characteristic exponent z, it is necessary and sufficient that we 
should have, as r+, 


(1.5) 
iF am , 


where for any constant k >0 = > 1, @ = 1,2), 
(x 
while the ratio hy(x)/ho(x) tends to a constant limit ( 


cf. Gnedenko and Kolmogorov 
(1949, p. 175): also Dynkin (1955)). 


We may, e.g., take hiv) = ci(log a)", 


Particularly important is the case when the k(x) can be taken as constants, 
so that 
c 
FE, 
(1.6) 
l— Fe) , 
4 Lë 


where c, > 0, c > 0, and c 


cresi The norm 
then be determined by th 


alizing constants B, in (1.1) can 
e formula 
B, = boll 
In this case, F( 
ble law (ef. Gn 


In two papers (Cramér, 19 
1937) T proved the existence o 


where b > (ig a constant. 
attraction of the limiting sta 


(1.7) 
g to the domain of normal , 
Mogorov (1949) p, 191). 
1928) and in my Cambridge Tract (Cramér, 
mptotie expansion for Falz), as n œ in the 
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ASYMPTOTIC EXPANSIONS OF RANDOM VARIABLES 


case of a normal limiting distribution. I also considered the case when the 2; in (1.1) 
are not identically distributed. Moreover, I gave (Cramër, 1928) a similar asymptotic 
expansion for the probability density F(x). 

All these results, for the normal limit law, have since been generalized and 
improved by various authors [Esseen (1944): Gnedenko and Kolmogoroy (1949); 
Petrov (1959) and others]. For the case of a non-normal stable limit law, on the 
other hand, comparatively little seems to be known. Some results have been given in 
two papers by Bergstrëm (1951 and 1953), and I have recently published a preliminary 
note on the subject in the Anniversary Volume dedicated to G. Polya (Cramér, 1962). 


In the present paper, the problem of asymptotic expansions for /’,(x) will be 
considered for the case of normal attraction to a non-normal stable law. It will be 
shown that, when appropriate conditions are imposed on the given d.f. F(x), there 
exists for the df. F,(x) given by (1.2) an asymptotic expansion as n— co, the successive 
terms of which tend to zero as certain, in general fractional, powers of 1/n. Similar 
expansions can be shown to exist in the case of non-identically distributed variables 
and also, under appropriate conditions, for the probability density F(x). 

In the case of non-normal attraction, the situation is in general quite different. 
When e.g., the functions A(x) in (1.5) are of the form llog x)! with a non-integral q, 
similar conditions as in the normal attraction case, an asymptotic ex- 


there is, under $ i 
However, the successive terms of this expansion only tend to zero 


pansion for Fyt). 
vers of l @his case will not be further dealt with in the present paper. 
as powel log n 

9. THEOREMS ON ASYMPTOTIC EXPANSIONS 


Throughout the following, %, f and y will be real constants satisfying the 


relations oraz a<ßp<y. a CZA 
nd c are non-negative constants, at least one of which differs from zero, 
are any real constants. In particular, we may have d, = d, = 0. 
we assume that the corresponding d is non-nega- 


Further, cj a 
while d, and do 
However, if c, Or ¢2 is equal to zero, 
tive. Let F(x) be a given df. satisfying the following conditions (A) and (B) : 
et F(x a 
(A) As «> 00, we have 


Pj—a) = + + i +r (2), 


C 

ga 
2 d 

1—Fe) = 3 + tne) 


r(e) = ofz) (i = 1,2). 


In the case when 0< Y <1, the functions 
7y(2) (2) 


umed to be monotone for all sufficiently large x > 0. 


(B) 


are ass 
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It will be seen that condition (A) implies considerably stronger = 
i 5 5 3 ASS ons 
about the infinitary behaviour of F(x) than do the relations (1.6). Some assump 
of this kind will be necessary in order to get more precise in ese 
asymptotic behaviour of F(x) for large n than the one provided by the limit rela 


E ki 9 Pe i iti is 
(1.2). However, the method used below will be applicable even if condition (A) is 
modified in various ways, e.g. by 


expressions of F(—z) and 1—F( 


assumptions 


formation concerning the 


the introduction of further terms in the asymptotic 
me). 
For our Theorem 2, we shall 
the c.f. f(t) corresponding to F(x) : 
(C) lim sup |f] <1 as lt] co. ka pë 
It is well known that, in particular, this condition is always satisfied when the d.f. 
F(x) contains an absolutely continuous component. 


also use the following condition (C), bearing on 


We shall denote by G,(x) the stable df. corresponding to the c.f. gat) obtained 
from (1.3) by taking h = 0 and c = 1, so that 
Ialt) = exp [—|t] “(1 +peie(t, «)], a (2:2) 
where we now take t= ee 
while ot, æ) is given by (1.4). 
Further, we shall use the notation 
k= katka Bat) + Keg(2—ct) + leg, (2.3) 
where ky, ka, kz a 


and k, are non-negative integers 
ws, in particular, that we always 
Tf Pi) is a polynomial in £ of 
cients depending on the k,, 


» at least one of which differs from 
have k > 0, 


degree k,+ Keg Teg+ leg 


zero. It follo 


—1, with complex coeffi- 


we write 
o 
Ge: PA= j] PPP). (2.4) 
m ò 
It will be shown below in Lemma 1 that G(x: Pj) is a function of the real variable 
x which is everywhere continuous, has derivatives of 


all orders, tends to zero as 


“oo, and is of bounded vari real axis. 


ation over the whole 
wo main theorems. 


Suppose that a, Band y are not integers, 


) and (B) it is possible to choose the normaliz 

and the polynomials P, such that, as n>, 

Fate) = G(x)+ > Giu: P jna FO(n-Me j, 
O<k<A 


We shall now state our p 
Theorem 1: 
satisfies conditions (A 


If the given d.f. T(x) 
By, = bile in (1.1), 


ing constants A, and 


uniformly for all 


(2.5) 
k given by (2.3), 


eal x, where À — min (1, Y—a). The sy 
which Satisfy the inequality 0<k<j, 
Suppose that a, B 


MË isfies conditions (A 
of (2.5 ed over all Ib 


mmation is extended over all 


and y are not integers, 
) and (C) 
such that 0 Sk 


If y—a > 1, and 
the summation in the second member 
< Yg and the re 


O(n-u - Gja) ` 
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Remark (1): Ifa or 2 (or both) is an integer, powers of log £ will occur under 
the integral sign in (2.4), and the terms of the asymptotic expansion in (2.5) will be 
multiplied by polynomials in log n. Ify is an integer, the majorants of the remainder 
terms in Theorems 1 and 2 must be multiplied by log x. Since the explicit formulas 
for these cases are somewhat cumbersome, we shall restrict ourselves to this remark. 

Remark (2): Note that, in the case of Theorem 2, we necessarily have y > 1, 
so that condition (B) does not apply. 

Remark (3): In particular cases, some of the Ge; P,) may be identically zero. 
An extreme example of this phenomenon is obtained by taking F(x) = G(x). For 
appropriately chosen normalizing constants, F,(v) will then be identical with G(x). 


3. SOME LEMMAS 
Before we can proceed to the proofs of the main theorems, we shall have to 
state and prove a number of lemmas. 
Lemma 1: The function G(x; Pp) defined by (2.4) is an everywhere continuous 
real-valued function of the real variable x, which has derivatives of all orders, tends to zero 


as x+, and is of bounded variation over the whole real axis. Moreover, we have for 


i= 0 
ferra Qe: Pj) = +P) galt). ss Oa) 
-œo 


0, the integral evidently takes the complex conjugate of its value for —t > 0. 

5 

Tt will be seen from the expression (2.2) of g,(t) that the integral in the second 
, as well as all the integrals obtained by repeated differentiation with 
absolutely and uniformly convergent for all real v, and tend to zero 
s only remains to show that G(x; Pj) is of bounded variation, 


Fort < 


member of (2.4) 
respect to v, are 
as vo Koco. It thu 
and that we have the relation (3.1)- 

In order to prove the bounded variation property, it is sufficient to show 
the integral 


J(«) = 
+ (—o0, 00). The derivative J'(e) can be calculated by 


is of bounded variation over 
e integral sign, and we have to show that J’(«) is abso- 


formal differentiation under t L a 
itely tes over (—co, co). The proof will be different according as q < 1, or 


qg> 1. if g< 1, we must have 0<& <1. For «> 0 we then obtain 
SL 


P -itz j gD fiex [ —ite-* tuigt” \ ii 
—i f olte dv on ( nitg ) —it] dt. 


that, for any 4 > % 
tly, (ten dt 


cog 


S'e) = 

A method used in a similar case by Skorohod (1954) ‘ean then be applied to show that 

the last integral is bounded as w+. Since | J"(—x)| = (Jl, we thus have 
F(a) = Olle) 

as|v|— 0, which shows that IJ'te) is integrable. 

Vi 
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For q — 1, a repeated partial integration shows that 
æo 
I(x) =—i J H(t, v)gi(tat, 
0 


t Në 
where A(t, x) = fdu li ve dy, 
d ò 


By elementary calculations we obtain 


Mt 
at, x)|<K 5 


and it then follows from the expression (2.2) for g,(¢) that in this case Jr) = O(a-®) 
so that J'(x) is integrable. 


Thus G(x; Pj) is of bounded variation over the whole real a-axis, so that the 
Stieltjes integral in (3.1) exists. It is then easily seen that (3.1) is the F 


ourier-Stieltjes 
transform which is reciprocal to (2.4). 


Since both integrals are absolutely convergent, 
(3.1) is certainly valid, and the proof of Lemma 1 is completed. 


We now proceed to two lemmas concerning the properties of the c.f. f,(t) 
defined by (1.2), for large values of n. 

Lemma 2: 
conditions (A) and (B 
B, = bn! 
we have 


Suppose that x, B and Y are not integers, and that F(x 


). Then it is possible to choose the normalizing constant 
“in (1.1), and the polynomials P, such that 


) satisfies 
s A, and 
» uniformly for 0 < t < neni, 


SAO = emoe TL E tëtë PA He 4 O14 Bye p-r ale, 
O<k<v-a 


where 29 
ete, “9 9° soa (2) 


while 8 is a positive constant only dependent on a, 
k is given by (2.3), and the summation is e: 


ers ity, ..., ky 
such that OS k < y—a. 


t) = Fl—t). 


For —nila 


We use in the 
argument it, with 


degree < y—l, not necessarily the same in different formulas. 


omial Q will be identically zero. 


By partial integration we obtain 


oo 
t) =t f 1— Fu: )) si ET 
J) J (1 P(x) + F(—2)) sin tx da-it I (1— Fte)—F(—)) cos tada:, 
The integrals from 0 to 1 can be develo i 
a veloped in conver: ST series i 
grals from 1 to ©, we replace F(—2) and 1— F(x) Pë rë. gia vend 
í elr expre 


ssions according to 
IS 
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condition (A) and then obtain, using the known properties of integrals of the form 
J x cosadxand f a” sin wdx, assuming t> 0, F 
f(t) = 1—Lis— Mte — R(t)—itQlit)— Ot), 


where L =| etestile,—eadig a T bë dr, 
2/0 


xa 


` 


M = pid,+d,)+io(dı—də), 
Rj =t f (7,(v)-++r.(x)) sin te durit i (ry (a)—relx)) cos tx dx. 
i 


Here p and ø are real constants only depending on A. We now show that R(t) is, for 
small {> 0, of the form iQ(it)+O(t”). This will be explicitly shown for the cosine 
term; the sine term can be dealt with in the same way. 


Consider first the case OSY <1. Writing r(x) = ry(x)—7,(x), we have, 
denoting by K an unspecified constant, 


| at 1 
| kri de 
0 g 


dt | r(x) cos ta da 
1 


z pa 

iz ( Z) cos xdr 
t t 

and by the second mean value theorem, assuming £ so small that r(x) is monotone for 
æ> lft, 


sKt, 


it T ra) cos te de =|f r (F) cos & r (G) f eos de 


It 


I 


= 


so that our assertion is proved for this case. 
Now suppose y> l, and let Sh—l < y< Zh--l, where h is a positive 
integer. Then 
i tf rf) ( cos ET E E E, 
it J r(x) cos te de = its r(x ( st E Çaji ) ; 


where Q is an even polynomial of degree 2h—2 < y— 1. Majorating the cosine 
lifference in two different ways, we find that the first term in the second member is 
d 


majorated by 


ga (de 1 sh—v d F që da 
KU f cos t—l-.. +t PRINT | 27 ckr(ta w+ f x e). 


Si =y >=], while 2h—2—Y nil, both integrals are convergent, and our 
Since 2h— RE E 
assertion.is proved. We thus finally have, as t > 0 tends to zero, 

fO = V— Lis Mt —itQ{it)—O). = TË) 


We now choose the normalizing constants B, in (1.1) so that 
B, = bile, 


where is the positive root of the equation 


© sin x 
= (¢+¢,) f — 

0 v 
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In (3.3) we replace ¢ by ¢/B,. and write as an abbreviation 


t s (BA) 
E nile 2 


i : Ap Va—ltja 
Tt will then be seen that as long as f remains confined to the interval 0 < £ gale, 


we shall have 0 <7 K n18, so that 7 tends to zero as n>. uniformly for all 4 in 
the interval considered. 


Hence we obtain, taking account of the value of the 
constant L, 


Hz th) 1—Gr”— Dr? —irQ(ir)-+O(7"), a (3.5) 


where C is given by (3.2) while D, as well as the coefficients of the polynomial Q, is 
independent of n and t. The error term is O(t”), uniformly for all £ in the interval 
considered. 


We now choose the normalizing constants A, in (1.1) so that 


A, =—nbq, 


where q is the constant term of the polynomial Q(i7) appearing in (3.5). From (1.2) 
we then obtain 


Silt) = giir | ( TIT 


and further by means of (3.5) 


108 falt) = n log | esë f (5) ] = mlog [1—24+00), 
where Z = (Cr9 1 DrB eni: -HirQy (ir), (3.6) 
Qylir) denoting a polynomi 


al in ir, with the same properties 
a constant term, 


as before, but withoul 
Since æ SB and æ < 2, it follows that Z — O(r"), so that Z—0 
as n>. Consequently 


log fal) = n log (1—Z) 4. O(mr7) ——n $ 1 2" 4-Olnr?), 
vel V 


the integer p being t 
means of (3.6), we ob 
nent of the form 


aken so that Pa < Y <(p+l) a. 


Expanding the power Z” by 
tain a linear 


aggregate of powers of T, each term having an expo- 

j= hot jaht 2js-tj, = (ria 
where the j; are non-ne, 
we always have 


Thus we may wr 


log f,(t) = 


“Ha atja P— oe) 4+ j(2—2) 45,40, 
gative integers such that Iithetis =v. Tt will be seen that 
J >q, except only in the case when y — jSI, 9h = ja = jy = 0. 
ite, replacing j by k+o and using (3.4), er 
nTn z Dyrë tet On”) — —Cta 


HD Dyrt 4-O(terY =a), (3.7) 
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where i: is given by : 
k = kath P—a)--ka2—a)--k., see (328) 
the k; running through all non-negative integers such that 
O<k<y—a, and ky tl > kytëq. 
(The second inequality is another way of writing the evident inequality j,;+-jo+J3 > 
ja-ja) In particular, the powers 7°, 787%, 72-@ and 7 will always appear in the sum in 
the last member of (3.7), in so far as their exponents are Ky—a. This remark will be 
used in a moment. 
Taking 

A = min (g, P—a, 2—a) > 0, 
(3.7) may be written 

log f,(t) = —C#+U+V, 

where U = Zi Dyrë = O(iT), 


V = O(t'77-*). 
Tt follows in particular that for all sufficiently large n, and for all tin the interval consi- 


dered, we have 


ee QENË 
usg and  |F|< TË 


Further 


TO gu, ee = e S ¥ oem) poret), 


v—o Vi 


AD e 


Taking here the integer q such that ga < y—# < (1+1), we have, observing that 
according to (3.2) the real part of C is equal to unity, 


fat) = oO" (1+ BS) oe pet se). (8.9) 


Now we have for v= bën I 
U = rÈ Dyyt*+-O(77-*)] E (8.10) 
T 


where the D,, are constants, while £ is given by (3.8) the k; being now certain non- 
re the Dy, are cons 


negative integers satisfying 
‘ s 0 < k< ya. ae (11) 


‘ tive integers satisfyi 3.11). It then easily 
Let k,,..., kj be any given set of non-negative integers satisfying (3.11) hen easily 


follows that we must have 
kitkat katka <% 


that a term in Ukstkatkattës will certainly appear in the sum in the second 
so that a te 


member of (3.9). According to 


power will include a term in 
më = (ne) (18-8) (72-8) ars, 


a previous remark, the expansion (3.10) of this 
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On the other hand, it is obvious that no U” with » > ky+...+-h, can include the power 
TË with the given values of the kj. It follows that we may write 
$ Që 4 ës pë aha 
py y 5L E EP (18) + OT (te gota paj, z (8,12) 
si v! k 


where the sum in the second member is extended over all k given by (3.8) and satis- 
fying (3.11), while P,(1*) is a polynomial in të of degree ky+-...4+h,—1. Finally, we 
obtain from (3.4), (3.9) and (3.12) 


Flt) = en E t+ EP tyne] Oaet" n-ti-e)la ] 
k 


uniformly for 0 < £ < wle-1/3, so that Lemma 2 is proved. 
Lemma 3: If F(x) satisfies condition (A) there are positive constants p and q 
such that 
Nalt) < e—pnt's 
for nila-1j8 < lė] SK q n'le, 
When F(x) satisfies condition (A), it is obvious that the modulus of the c.f. 


f(t) cannot reduce to a constant., Hence we must have IAO] < 1 for 


all £ 5£ 0 in some 
neighbourhood of the origin. 


Thus we can find g > 0 such that IAO] <h <1 for 


g < lt] < 2g. It then follows from Lemma 1 of my Cambridge Tract (Cramér, 1937) 
that 


—h2 1—h2 
MO < 1— “jë të < exp (— së r) 


for |t| <g. (In the work quoted, this is proved under the assumption that | f(t)! 
Sh <1 holds for all lt] >g. The only property used in the proof is, however, 
that this inequality holds for I< ltl < 29.) 


Further by (1.2), taking B, = bnila, 


Moll = If (e), 


and thus for nila-ys < [t| gbnile 


II < exp [- a n (ar) < exp (- — ols). 


: 1—h2 
Taking p — Seg 15 gb, Lemma 3 is proved. 


Me finally 
Esseen (1944). 


Lemma 4: Let F(z) 
of bounded variation, over the 4 
the derivative Q “(ae) exists ever yu 


state the following lemma which is contained in one due to 


be any d.f. with the c.f. f(t). Let G(x) be a real function 
| G(—oo) = 0, G(--00) = 1, while 
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Suppose that for some positive constants T and e we have 


N 


[| L0=00 | ay =e 
-7 | I 


Then there are positive constants A and B independent of T and e such that for all real x 


F(a) —G(a)| < Ae+ a 


4, PROOFS OF THE THEOREMS 
In Lemma 4, we now replace F(x) by F,(«), and G(x) by 
G (0) E Gle; Pane, m s(t) 
k 


the sum being extended over all & given by (2.3) or (3.8), and satisfying 


0< k< À= min(l,y—a). 
Then f(t) will be replaced by fakt), while according to Lemma 1 we have to replace 


g(t) by 
Ct [1+ E tt Putem] 
k 


for t £ 0, C being given by (3.2), and by the complex conjugate value of g(—t) for 
ts0. 
and satisfy the inequality |@(x)| < K, 


Further, G'(x) will exist everywhere, 
ant, independent of v and n. 


where as before K denotes an unspecified const. 


From Lemma 2 we then obtain 


OI qi < Kx, i s (4.2) 
UES nija—1)8 U 


while Lemma 3 gives, taking account of the behaviour of g(t) for large |t], 
mma, 3 gives, 


JOIO) | di < K log ner < K n-0-le, 


ja Vac |e] < guile I 


RE y - KTA 
— grit and € = Kn-* in Lemma 4, which now yields 


Thus we may take 7 
-tja |< Anne 4Br e< Kre (4.3) 


| F(x) — Gala) = Galt; P,)n 
so that Theorem 1 is proved: 


I ler to prove Theorem 2, we extend the summation in (4.1) over all & 

n order 

Satisfying 
Oak = ee 
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By Lemma 2, the second member of (4.2) may then be replaced by an expression of 
the form Kn-°-%/e, We further observe that, if F(x) satisfies condition (C) there 
is a constant r < 1 such that for |t| > qn we have 


| fn Oj Kr 


and hence 


J Fab) dt < K(r"log n-+-e-"?) < Kura, 
gjej gure E 


Tn- -aa in Lemmi The last 
Thus in this case we may take T' = n -o/a and e = Kn" -ala in Lemma 4. The las 


member of (4.3) will then be replaced by Kn-@-o/a, 
completed. 


and the proof of Theorem 2 is 
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A SEQUENTIAL PROCEDURE FOR THE BEST POPULATION 
By IRWIN GUTTMAN 


McGill University and University of Wisconsin 


SUMMARY. This paper deals with the problem of choosing the best population from a 
collection of k given populations, where the term “best” is defined according to a specific criterion which 
is of interest to the statistician (e. g., largest mean, smallest variance, etc.), A procedure which is of a 
sequential nature is proposed and its properties studied. 


1. INTRODUCTION 
A pseudo-sequential procedure is given for determining the “best” popula- 
tion in a group of k given populations, with overall confidence £. The procedure 
is sequential in that after each stage a decision is made as to whether sampling is to 
be continued or not. The procedure is not sequential in the classical statistical sense 
in that observations taken in previous stages are not used in subsequent stages. 


Each stage requires the use of a selection procedure of the Gupta-Sobel kind 
(see Section 2), with varying values at each stage for the value of their confidence 
coefficient P*. The procedure terminates when either one population is left or the 
condition of the stopping rules of Section 2 are met. A condition is given for the 
procedure to terminate with probability one. 


2, FORMULATION OF THE PROCEDURE 


We are given a collection 7 of k populations 74, 7, ..., Mi ..., 7 in which there 
exists a so-called “best” population, where the population is defined to be “best” 
according to a specific definition of the following kind. 

Suppose the 7; are distributed with probability density function f(x; 0), 
where possible 0 is vector-valued. Consider a real-valued function of 0, k = g(0), 
Then the “best ” population is that population which has largest 


where g is known. 
i=1,...,k. We assume there is an ordering of the 


value among the h; = g(9;), 
(hy, +s hg) into 
hy < hy E < he) < hy: a (QA) 


The sequential procedure proposed below is based on a non-sequential Gupta- 
Sobel type procedure, which it is now necessary to describe (for examples see Gupta 
and Sobel, 1960 and Guttman, 1961). A sample of size n independent observations 
awn independently from each population and an “appropriate” selection statistic 
1, ..., his computed. (S is appropriate in the sense that Z(S) is either (0) 
function of g(0)). A rule is then formulated of the following type : 


S;60, 1 (eX S) ketë (2.2 


is dr 
Sig 


or a monotone 
Retain population mo if 


where 
O TË S) is a random linear set contained in the sample space of S, 
and depends on S = (So o S) ; it is such that the probability that the “hest” 


retained (if this event occurs, it is called a correct selection 


population is » and 
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denoted by CS) is at least P*. In other words, the probability that the value of 
S computed from the best population falls in oj (Pr, S) is at least Pr, and 

(ii) it is assumed that the 


inf P(CS; 0,,..., O) = Pë, ss (238) 
all 0: 


where SË 


i i subset of the & ations, 
In general, this procedure leaves us with a retained subset of the £ populations 


: ERNA n i be 
and the size of the retained subset could be | or 2 or... or k, and in general need not 
one. 


In fact if we let Y, denote a chance variable which e 


quals 1 if 7; is included 
in the selected subset, and zero otherwise, then we have that 


E(size of the retained subset) 


k k 
EVË Y} = EX Pia.) is selected 
i=l i=1 
k > 
= X PSone (Pr, S)). (2.4) 
i=1 
Now the above type of Gupta-Sobel (GS 
question, viz, if there 
best one? 


) procedure leaves open a legitimate 
are two or more populations in the ret 
The probability that the 
course given in (2.2), i.e., Pr(OS) > Pa, 


ained subset, which is the 
“best” one is in the retained subset, is of 
To answer this question we now formul 
index the stages of the SP by ¢. 
(2.2) at each stage, and it is interesting to note then, that if 
then a sequential procedure of the 
stopping rules are as follows. 


ate a sequential procedure (SP). We 
Essentially, we use a GS selection rule of the type 
a GS selection rule exists, 


type described below automatically exists, The 


(A) At stage t, use a GS selection rule of type (2.2) with Pë — Pi given by 
é If š 
P= (2.5) 
and draw independent sample: 
Populations retained, 


(B) Let k, 
lations at Stage t' 


s of size n, independent observations from each of the 
be the number of 
» Say ky, is g 
P f 

t < lo where to is defined 


populations retained, If the number of popu- 


the procedure if and only if 
= l for some { ci 


reater than one, continue 
below, and if ky 
(C) If a units of e. 
stage to ky = 1, 


œ Stop the procedure. 


able to spend on this procedure, and if at 
dure, where to is the largest integer 


apital are avail 
then stop the Proce 


P ie ‘ 
for which 2 knd<M 
t=1 


(2.6) 
where d is the cost per observation 


A SEQUENTIAL PROCEDURE FOR THE BEST POPULATION 


Now, let CS denote the event that when the above SP terminates that we 
have the best population: then if the procedure terminates because of (B). CS is the 
event that this one population is the best one, while if the procedure is terminated 
because of the economic condition (C) then we are left with a subset of 7 and CS is the” 
event that the best population is in the retained subset. We now state the following : 

Theorem 2.1: When the procedure SP given by (A)-(C) above is terminated, 
say at stage t, then the Pr(CS) is given by 

Pr(CS) > £. 

Proof: Let A, be the event of a correct selection at stage ¢. Suppose the 
process terminates at stage f;. Now, the probability that when the procedure stops, 
the event CS has occurred is the probability of the occurrence of the event 


ANA. Ar a (2.7) 


But we have that the 


Pran. 41) = 1— PA) a (28) 


where A, is the event that at stage £ we do not have a correct selection, i.e. the comple- 
ment of the event A, hence we have 


ty = 
AN NA > 1— x PA). s (280) 

SE E 

Now from (2.3), (2.5) and part (A) of the SP, we have that PA) < SË 9 

and hence Paii- ANA > 1—0 —A) $ = sae TO) 
21 

But i—0—-p) * Sh Za” 

and so Bart > P. se (QML) 


This completes the proof. 
3. A CONDITION FOR TERMINATION WITH PROBABILITY ONE 
In this section we assume that there is infinite capital available and thus 
part (C) of the stopping rule of the SP may be ignored. TIt is interesting to note that 
Theorem 2.1 still holds for if t, = 00, (2.11) still holds. 
Now, we will say that the SP is in state y if at any stage £ of the procedure 
there are exactly y populations retained, where y = k, ..., 1. From the very nature of 


the SP, it is clear that the states form a Markov Chain with non-stationary transition 


mu Piku = ay = y). < (3.1) 
We denote these transition probabilities by Pya (t, #+1) and note that these probabi- 
lities are dependent on Q0,, y (Pi, S) and that 1 <a <y=k, <k. Note that 


VË 
i Pya = 1. We may now state 
pyt) = 0if y <a and that = pyje y 
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Theorem 3.1: Consider the Markov Chain with the above structure. Let 
Paalt) = 1—6,(t), 0 < ô (t) <1, for æ #1. Then the Markov Chain is absorbed at state 
Ka sa E f E pra f 
1 (i.e. the SP terminates at a finite stage) if and only v2 da (t) diverges, ala Æ 1. 


Proof: Let a #1. The 


probability that the procedure remains in state 
æ for at least l stages is clearly 


Paa (to+1) ... Paa (lot?) e (8.2) 


where the procedure enters state z at stage fy, and the probability that it stays at state 
æ for an infinite time is 


lim Paa (lotl)... Paa (tot) = TË (1—8, (19-11) ne (3.3) 
ia lel 


and this probability i.e., this infinite product diverges to zero if and only if = dyll) 


diverges. That is, if $ dal) diverges, all æ, the probability of re 
tl 


infinite time is zero, and because of the structure of the Markov Chain, the SP will 
thus be absorbed at lina finite time. 


maining in x an 


Note that we may make 


the assumption 0 — dat) < 
all « and y in view of (2.3). 


< l since Pyalt) Æ 0 or 1, 


As a last remark, we wish to point out that if we hav 
about the possible differences of the hay, 
mation to find a “reasonable” 


e some prior knowledge | 


then it might be possible to use this infor- 
value for nj. 


Kor consulting (2.4), we have 


E(size of the retained subset) 


ke 
=à P(S; Enki (Pt, S,)). (3.4) 
But the P function is clearly a function of ny ky, P? and the 0 
is a function of nù, ky, P? and the hei. 
written as a function of my, hj, Pë 
Pi are known, 


, Which is to say, (8.4) 
In the special cases where this function can be 
and the differences hun 


—hqy, i< j and sinos k, and 
we can set this function equal to one and 


solve for the unknown Nj. 
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LONG-CHAIN POLYMERS AND SELF-AVOIDING 
RANDOM WALKS 
By J. M. HAMMERSLEY 
Institute of Statistics, University of Oxford 


SUMMARY. Self-avoiding random walks provide a simplified model of long-chain polymers, 
This paper gives some qualitative theorems on the behaviour of such walks, The analysis will be conti- 


nued in a subsequent paper. 


A polymer macromolecule consists of a long main-chain of atoms and some 
lesser side-assemblies (side-chains, side-rings, etc.) like ribs attached to a flexible- 
backbone. In chemical solution its shape is essentially that of its main-chain, long 
and wriggly like a random walk with many (10° or 10') steps. Successive steps are 
of pretty uniform length and random direction; but these directions are not statistically 
independent, partly because the side-assemblies restrict the freedom of the main- 
chain and partly because the main-chain interferes with itself. Essentially, the walk 
is self-avoiding, in as much as the atoms have physical size and no tyro atoms may 
occupy the same region of space. This constitutes the so-called excluded volume problem, 
on which. scores of research papers have been written. Indeed, the literature merits 
a summarizing review article every three years or so: see, for example, Wall. and Hiller 
(1954), Hermans (1957), or Casassa (1960), each with their lengthy bibliographies. 
Despite all this, we still know very little about self-avoiding random walks; for the 
problem is extremely difficult. 

The present paper, like many of its predecessors, deals exclusively with self- 
avoiding Polya walks. They afford the simplest possible model of the excluded volume 
problem : indeed, they are an over-simplification of the chemical reality; but it seems 
ark on more realistic and more complicated kinds of self-avoiding 


presumptuous to emb 
learnt how to handle these simplest self-avoiding walks. 


walk until we have first 

Consider, then, d-dimensional Euclidean space with coordinates represented 
vectorially by w = (*, Y» ...,2). We confine our attention to points of the hyper- 
cubical lattice in this space : that is to say, a point will hereafter mean a point whose 
coordinates are all (positive, neg 
if they -are unit distance apart. We define an n-stepped Pélya walk to be 
an ordered sequence of n+1 points such that each pair of successive points are igidi 
bours. Wewrite W; = (ti Yi => zj) witht = 0, 1, ..., nfor the successive points of such 
a walk; and P,(w) for the class of distinct n-stepped Pólya walks with a prescribed 
first point W = W. A walk is called self-avoiding if all its points are distinct. 
We write F (w) for the self-avoiding subset of P(w). Clearly P (w) and F lw) 
depend upon wW only through a trivial translation; and for most purposes we may work 


with P, = P,(0) and F, = F,(0), where 0 = (0, 0,...,0). The number of members 
29 


ative, or zero) integers. We say that two points are 


neighbours 
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of P, is clearly (24)”. We write fa for the number of members of Fy: there is no 
n = 
known explicit formula for Foes 


though clearly we have the trivial result 
PETË eg se. TË 
L fa K 2d(2d—1)"-1, 


with equality on the right when d = 1, 


because no step can be in the opposite 
direction to the previous step. 


Individual members of P, and F, will, by hypothesis, 
have equal probabilities. Rd)” and Tae 


ends of a walk from F, is 


2 2 
2 138 s (2) 
Ta = V (2HY... +22) 
and this has a mean-square expectation 
2 e jo w 0 
Qn = Er = dE =4 $ eA) fas 
t=-n 


o "of S u aving 
where Q, is defined by (3) and f(x) denotes the number of members of F, having 
% =x. Here, Q, isa measure of the overall size of 
how it behaves as a function of n. 


Qn beyond the trivial inequality 


a walk: and it is natural to ask 
but nothing rigorous has ever been proved about 


0 < Q, =. n2. e. (4) 
with equality on the right when d = 1. 


We may compare (4) with the corresponding 
familiar result Qn 


= n when the walks are drawn from P, instead of Tt, 
The literature contains three main forms of 


attack on the problem : they 
are respectively the enumerative, the Monte Ci 


arlo, and the analytic attacks. 
In the enumerative attack, 
given n and hence calcul 


as we shall see presently, 


one enumerates all members of F, for some 
ates the corresponding statistics Qu and the like. However, 


(24-1) —log(2d—1)7" < fa < 24(2d— tyr 
is an improved form of (1) ; and this shows th 
large unless n is very small. In the most im 
tion has only been taken as far as n = 11, 
paper calculations, such 
O'Flaherty (1961) 
exploit the combin: 


future improvemen 
shortly, 


The Monte 
of drawing an unbi 
recent advances in 


(5) 
at the population F, is unmanageably 
portant practical case d — 3, 
Enumeration may 


as Sykes ( 1961) performs, or b 
and Martin (1962) show, 


atorial symmetries o 
ts in progr: 


enumera- 
be either by pencil-and- 
y electronic computer, as 


The former alternative, better able to 
£ the problem, h 


as so far proved superior: but 
amming techniques 


and machine specd may change this 


Carlo attack ex 
assed sample j 
Sampling tec 


$ In particular, notable 
(1955), Wall 


lends some Pp 


amines a small random sample of Fa The problem 


S not as simple as it may appear at first sight: but 
hniques have allowed calculations with n 
work has been done by Wall, Hiller, 
ane Erpenbeck (19: : and Marcer (1961). 
rovisional Support to the Conjecture that Q; 
when d = 2, and perhaps li ction Increasing 
example, n log n) when d= 3. 


as large 
and Atchison 
Monte Carlo work 
behaves rather like n3/2 
a little more rapidly than n (for 


respectively. The distance between the two 


-S 
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The analytic attack attempts to handle the problem by pure mathematics. 
All attempts at precise quantitative results, however, encounter insuperable mathe- 
matical difficulties; and authors, invariably foreed to make one or another simplifying 
assumption of more or less doubtful validity, have arrived at a wide variety of mu- 
tually conflicting, and hence rather unconvincing, approximations. On the other hand, 
analysis has yielded a small number of qualitative results, taking the form of existence 
theorems. For example, Hammersley and Welsh (1962) proved the existence of jtwo 
constants K and y, where k depends only upon d and is known as the connective constant, 
and where y is an absolute constant, such that 


kn < log fa < Kn+yn4/?+-log d; ae (6) 

and Rennie (1961) proved that 2d— logd—c e”, ctw (2) 
where c is the absolute constant 

c = 3— Vi2— log 2+ S- grt) : ses (8) 


re3 rr— log (r+1)] 
The purpose of such existence theorems is to show that certain quantities, in which 
we are interested, have certain functional forms, prescribed apart from one or two 
unknown parameters. For example, (6) shows that log f, is very nearly a linear func- 
tion of n when » is large, the slope being an unknown parameter x. We then combine 
this kind of information with either an enumerative or a Monte Carlo attack, in the 
former case to select the appropriate extrapolating formula for attaining to larger 
values of n, and in the latter case to select the appropriate statistical hypotheses 
and estimates for fitting to the Monte Carlo samples. This kind of combination 
of attacks seems quite promising : for example, Sykes (1961) used (6) and his enumera- 


tion to estimate that 


2.6390 + 0.0005 when d = 2 
“= (9) 


4.152 + 0.003 when d — 3. 

The present paper is an analytic attack of this qualitative kind, aiming at existence 
theorems which will generalize (6) and assist an enumerative or Monte Carlo attack on 
Q,. We shall make some progress in this direction; but the results are very much 
less satisfactory than we might expect. Nevertheless, the results seem to be heading 
in the right direction; and this paper will serve its purpose if it stimulates further and 
more comprehensive research. 

Specifically, we shall try to investigate how /,(v) behaves for large values of 
n. If this investigation could be successfully carried through, it would yield all we 
need to know about Qh; for we know that 


h= i fal), vay (10) 


aan 
and we could then calculate Q, from (3) and (10). We shall psr us predioting how 
fa(x) behaves when v increases like a multiple of n. Our approach is in fact a suitable 
generalization to dependent variables of Blackwell and Hodges (1959) treatment of the 
extreme tail of a convolution. Unfortunately, this is not the most interesting range of 
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values of x. To predict the behaviour of Qn: we really need to know how f,(r) behaves 
when x/n—0, and perhaps in particular when x is a multiple of y/n. This may call 
for some sort of generalization of the central limit theorem; and we are unable to 
supply this. 

Nevertheless, our tools in handling fy(a) will be very similar to those used 
customarily in the central limit theorem. We define 


H,(t) = X etf(x). s UU 


Here ¢ is a real number : it has to be real because we have to use inequalities rather 
than equations in the analysis. Clearly H,(0) = fa; so that H,(t)/H,(0) is the familiar 
moment-generating function of the random variable ta We also define, for 0 KËSI, 
H,(f,t)= Z efla). we (12) 
v> Bn 
Let Fi denote the subclass of walks (to Yi ---, z) of F, such that 0 = ky L vi KX, 
for i = 1,2,...,n: and let fil) be the number of members of Fi which have a, = v. 
We define Hit) and HYB, t) by replacing f¥(x) in (11) and (12). Of course, we adopt 
the convention that the number of members of 
fal) = 0 if x <0, 


Theorem 1: 


an empty class is zero : thus 


Bel EME) < Saral) < Salt) < X; Jnl E)fa(e— E). i (18) 
Hyla, NAB, t) < Hë in (tën > t)< Fins a(t) < H,,(t) H,(t). ar (14) 
e < HAB, t) < H(t) < (del ye, , wie (6) 


Proof of Theorem 1 : Consider a given member of Fh with x,, = & and a given 
member of F* with n = v— é. If we translate (bodily without rotation) the latter 


walk until its first point coincides with the last point of the former walk, we obtain 


a walk belonging to Farn with nin =v. Since each distinct pair of members from 


Fa and F* will yield a distinct member of Fen, the left-hand side of (13) follows. 
The right-hand side follows simil 


tag ie arly upon considering any given member of F,,.. and 
ea its last n steps until they belong to Fi, retaining its first m steps as a 
vi al Bice Distinet members of Prion Will yield distinct pairs of members Fm 
m The central Inequality of (13) is trivial. From (13) we deduce 
Zo eja) Deut ps 4 E E petk po 
Ezan Y>Bn | Fo) $ “Dom Be Sarn (er) 


5 “ty on ai fmin (x+y), < (16) 


follovre Borë the pi md left-hand Side of (14). Similarly the right-hand side of (14) 
Sht-hand side of (13): and the central inequality of (14) is trivial. 


The Aite: 
he inequalities in (15) follow from (1) and the Consideration that the walk (i, 0, 0 0) 
t, 9, O,..., 


(= 0, 1,...,n) contributes to H* 
Moreover this is the only walk 


which is equivalent ti 


WP, t) for al 
which contributes when £ = 1, P, t) for all fxi. 
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Theorem 2: If A, is defined for n= 1,2,... and satisfies 


vedas Ann SAAR s (17) 

for some constant a, then lim Al” = inf All, 5 l (18) 
no n>1 dë ie, 

Alternatively, if A, satisfies 0 E AA E Ana < a, ese (19) 

then lim All = sup Ar, (20) 
nO n>1 n 


` The existence of the limits in (18) and (20) is part of the conclusion of the theorem. 


Proof of Theorem 2: The fundamental theorem [Hille (1948) Theorem 6.6.1] 
on subadditive functions states that the inequality 


Bara < Bn +By tee (21) 
implies lim B,|n = inf Bin, da (22) 
. n= nel 


where the limit on the left exists, although it may be formally equalto —co, Theorem 2 
now follows upon taking B, = -Llog Aj. 

Theorem 2 is a key theorem in what follows; and, since it is really only a 
modified form of the fundamental theorem on subadditive functions, it is really the 
latter theorem that supports the whole analysis. To this extent it is worth mentioning 
that a generalized version of the fundamental theorem on subadditive functions exists; 
see Hammersley (1962) for details. This generalized version is not used in the present 
paper; but it might well prove useful in a more penetrating study of self-avoiding 
walks. 

Theorem 3: There exist functions O(t) and O(P, t), both independent of n 


such that 


H(t) = exp {n[O(t)+-% n]} sa 28): 
yf, t) = exp MO, t)—op,4n]} one (24) 

where 0, ANd Open Are non-negative and tend to zero as n—900. 
t KO, t) KO < lt] + log 2d. pes (28) 


Further, 
Proof of Theorem 3: Put a= £ in Theorem 1, and apply to Theorem 2 


and (15). In fact 
A(t) = log hit i La, (): KË, 0 = log ki LEMË, ty. (26) 


Theorem 4: There exists an absolute constant y such that 


HAP, t) < dekori, (20): i (27) 

due ATA dete. (t <0), <.. (28) 
Proof of Theorem 4: For bd given self-avoiding walk {(2;, Yo -..5 2 hin — 

let p be the smallest integer satisfying % = acts vı and let o be the largest 
integer satisfying Ve = ogi “hg We call x,—a, the extent of the walk; and we 
call (8p Yo oer Mepis Ode +P and (të Yi +++ 2)hizoo+s,...,n respectively the lower and 
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upper tails of the walk. By a lower (upper) unfolding we mean the transformation 
which reflects the lower (upper) tail in the hyperplane « = x, (hyperplane w = wx, 
and which leaves the rest of the walk unchanged. 
n-stepped self-avoiding walk into an n-stepped self-avoiding walk, and it increases 
the walk’s extent by the extent of the reflected tail. 

If we iter: 


Clearly, unfolding maps an 


ate the upper unfoldings of some given walk, and write Aj, Zydi 
A,1X,1X, 2: for the extent of the original walk and its successive unfoldings, it 
is easy to see that s 
Ba Aa es Re SD: ve (29) 
In fact X; will reach zero before ( 


and usually long before) j reaches n; but that need 
not concern us here. 


When further iterations yield no fresh walk (i.e. when X; = 0), 
we may repeat the procedure with lower unfoldings, thus giving walks with successive 


extents X,+X,+ ... “and Regs ves Mg y A aes Teg a sy, Haake 


where EA pS OP in Oe OG cull 


At this stage the walk is fully unfolde ails have zero extent. 
Hence, if the walk has become CA as Z)i=ou,..., n We have X; < X$ < Xi 
Finally precede this walk by the single point (w—1, ys,..., zg) and then translate 
this augmented walk (bodily without rotation) until its first point coincides with the 
origin, 


(30) 
d on each side, and bothits t 


When the original w 


of F, onto Kra We now show that at least one and at most 4e%/” distinct members 
of F,, map in this way into any given member of Fis, where y is an absolute constant. 
Let W be the Specified walk of Fri. and Suppose that X+-1 is its extent. By omitting 
the first step of W and translating the remainder to start at the origin we obtain one 
member of F, which maps into W. On the other hand, let Kas Rony say Kis; Ko, TË, cn KË 
be any integral solution of i 


= Mt Ket} KANG. pk 


which also satisfies (29) and (30). We can now fold up the residue of W 
its first step) in a manner intended to restore the origin 


m a ; kae z 4-left folding to be a reflection in the hyperplane Pa of all that 
në TË a S a» Where P 4 lies a distance A to the right of the left-} 
po pë si ; i ght-folding to be a reflection in P 

gat of Pz, where P 3 e B to the left of the ri 
E consists of X; 
ni tight-folding 
origin. 


alk is a member of Fa the above procedure is a mapping 


(31) 


(after omitting 
ating member of F,. To be 


and support 
g Of what lies to 
ght-hand support hyperplane 
-left-folding, X n-rleft-folding, nj Xa-left- 
> e in that order. Finally, we translate 
folded image of a self-avoiding walk 
Ons of (29), (80) and (31) may not 
bers of Fa yield all thoge mem i pn so ne 

n W's 
ap into W does not exceed 
er of solutions of (29) and 


folding, X,-tight-folding, xX 
the result to start at the 


bers of F, which m 
(30) and (31). The numb: 
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(31) is P(X), the number of partitions of X. Similarly there are P(X) solutions of 
(30) and (31). Hence the number of solutions of (29), (30) and (31) does not exceed 


[P(X)P < PE < 467", i, (82) 

for some y because log P(n)~274/(n/3) as n—oo i (33) 
according to Hardy and Ramanujan (1917). 

Now let g, (x, X) denote the number of distinct walks in F, having x, = v 


and also having fully unfolded images of extent X+-1 in F*,,. What we have so far 
proved amounts to the inequality 


È gal, X) KO fazı (X+1). ve (84) 
The summation on the left of (34) may be restricted to x < X since g (x, X) is zero 
otherwise. Now, ift>0 
FUS I fale) = E e g, (x, X) 
“ap 


X>xe>Bn 

< og AK E et I g(x, X) 

AXzazpr Xoxpn XX 
< B geh fija (XH) = deme BT fi, (XH) 

X>Bn p X>Bn 
< qevint bal elZ Jisa (X4+1) 

X+1>ß(n+1) 

= He Ha (B, 1) K qer Ho 
< gey /n—t4-nG(B,0 + t+ log 2d — JenBt)+%Jn, w (35) 


In the derivation of (35) we have used (34), the inequality 0 < P < 1, (24), and (25). 
If t KO, we have instead 


= at f(a) < E E file) = ecë HB, 0): mi (36 
HB, t) KA, fle) < min ) 


and we can now complete the analysis by applying (35) with t = 0. l 
Theorem 5: O(f,t) is a concave function of B for each fixed t; that is to say 
Olpa+4b, t) > pole, 1) -a0(P,, 0OKAaKLOKAKI) - 8) 
whenever PZOGZO and p+q =1. Moreover O(f,t) is a continuous function 
Z 
of BindO<f<l. 
Proof of Theorem 5: Putm = nin (14), substitute from (24), take logarithms, 
divide by 2n, and let n> to obtain 
40læ, t) +408, t) < Oat zA, t) ve (88) 
for any fixed æ, J. Since (25) shows 0 to be bounded, (38) implies the stated vërë 
n pea of e theory on convex functions : see Hardy, Littlewood, and Pélya 
i anda 
(1934) $3.18. 


Theorem 6: 
ne A(t) = 0(0, |t|); vr (89) 


ed < H(t) S 2dersMO+y yn “i (40) 
35 


and 
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Proof of Theorem 6: The definition of f (x) shows: it to be an ken paa 
of x, by symmetry. Hence H,(t) and 6(t) are even functions of t, nee it a Pr piae 
to suppose £ > 0. Since HO, t) < H,(t) < 2H,(0, t) when t > 0, the result follows 
from (23), (24), and (27). 

Theorem 7: For real t, let 

K(t) = log E(e“t) awe A 


function of a bounded random variable X, whose possible 
values are integers with greatest common divisor 1, 


be the cumulant generating 


Let b be any number satisfying 


EX <b < supX, ee (42) 

and define md) = inë {K)o i (43) 

Then. prob { =x, > oy} = exp {Nu(b)+0,(N)} as N> ni (44) 
i=1 


where X (i= 1, 


2,...) is a sequence of independent random variables each, distributed 
like X. 


Proof of Theorem 7 : This theorem is simply 
of the main theorem in Blackwell and Hodges ( 
priate to our present needs, 


Theorem 8: Let K,(t) and n,(b) denote the functions K(t) and mlb), which 
occur in Theorem 7, in the particular case when X = x, C Vas ce, Z)}i = 0,1, ..,n being 
a random member of F,. Then 


a considerably weakened form 
1959), rewritten in notation appro- 


noB, 0) < Kan p)-- log fa, (O<p< 1). 


(45) 
Proof of Theorem 8 : We have 

OS AE Sb së sup X =n; (46) 
s0 we may take b=fn, 0< pal. (47) 
The number of P 


dlya walks, having nV steps, 
eces, where each piece belongs 
the right of the hyperplane a 


N independent pi 


starting from the origin, consisting of 
and finishing to 


to F, (apart from a trivial translation), 


= Pan, is the left-hand side of 


N 
Prob {2 KS Bon) Fi > Hiv, 0); 
and the right-hand side of 
Fry. This Proves (48). 
tute (24) (with ¢ = 0) 


(48) 


(48) is the number 
Now substitute (44) 
into the right-hand sid 


ch also belong to 
into the left-hand side of (48) and substi- 


(49) 
Take logarithms of the combined results to obtain 

Nun(Br)--o(0) 4. log fn > nNOS, 0) ~06 ony]. (50) 
Divide (50) by N and let Nn. This Proves (45), 
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Theorem 9: (4,0) is a non-increasing continuous concave function of for 
<P <1, and it satisfies 
0 < HB, 0) < log 2—(1—f) log (1—f) — log (1--8) 

-Hog (d—1+8)—f log (fd—f +6), ve (61) 
where ô =+V[f?d(d—2)+-1]. s+ (52) 
Proof of Theorem 9: The definition of H%(f,0) shows that it is a non- 
increasing function of J; and then (26) implies the same for 0(4, 0). Because Of, 0) 
is non-increasing and concave for 0 < # <1, by virtue of Theorem 5, it must be 
continuous at 2 = 0. By (25), 6(f,0) is non-negative. The continuity of 0(f, 0) 
at 2 = 1 follows from this and from A whose right-hand side tends to zero as Bp 
tends to 1 from below. Consequently, it only remains to prove (51), and this is merely 
the special case n = 1 of (45). For, when = 1 all the Polya walks are automatically 

self-avoiding. Hence 


2d exp {K,(t)} = 2(d—1)+2 cosh t; ae (58) 

and an elementary calculation shows that 
(b) = Ky(7)—Ar, “i (54) 
where 7 is the root of sinh tT = A(d—l1-- cosh 7). vs (65) 


On eliminating 7 from (54) and (55) and substituting into (45), we get (51). 
Theorem 10: 
pl B)-- log fa < n0(8, 0)-4-y vin--log [d(2n-+ 1)]. nn (56) 
Proof of Theorem 10: From (43) and the definitions of K,(i) and H(t) 


we have 
f, oxp {uan B} = du oxp (ink [Ka()—n BA} = ink e-e H(t) 


inf È e Px file) < inf È enek f(a) 


t g=-n t> 0 ze-n 


< (2n-+1) inf ae gr F(x). s (67) 
t 


2092 


n 
Tn the last step of (57) the factor (2n--1) is the number of terms in the sum © ; and 
n the last ste 


== 
ordingly be over —n Ka Kn. However, we may ignore 


a ld ace 
the supremum shou t > 0 and f,(x) is an even function of x. From (57) 


the negative values of x, since 


(a—nB yt SI 
Ja exp (eal BË < CH1) con e E 


= ont it sup e=" Huen, 0)... (58) 
020 


where we may suppose 0 < a <1. Thus 


y= NA, 
In (58) put x (2n-+1) inf sup deh H (æ, 0) 
t> 


fa exp (pule PÈ S 


00<a<l 
: sw ela—B)nt+-nG(a, +Y Jn m 
< (apie iot vë i - (59) 
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by virtue of (27). Consequently 


log fatla, B)—yyn— log [(2n-+-1)d] 


<n inf sup [(4—f)i+4(a, 0)] 
t>00<ac1 


SA sup Ne, 0)-+(a—f)(—45)), ni (60) 


0¢adl 


where 43 is a one-sided derivative of Hf, 0 
a right-hand derivative unless # = 1, 
of 0g; and —Gj > 0 


) at #. We may take, for definiteness, 
The concavity of “WP. 0) ensures the existence 
since 0(B, 0) is non-increasing. Hence the last step in (60) ical 
consists in replacing t by a particular value, thus not decreasing the right-hand side. 
However, the concavity of 4(f, 0) implies that 


MË, 0) — A) > Ole, 0). s (61) 
Therefore log f,--n,(n B)-yvin— log [(2n+-1)d] 
<n sup 9, 0) = nA(f, 0), vee (62) 
0ga<¢l 


which completes the proof of Theorem 10, 
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A SIMPLE STOCHASTIC MODEL OF CONTINUOUS CULTURE OF 
MICROORGANISMS IN SEVERAL BASINS 


By JAROSLAV KOZESNIK 
Academy of Sciences, Czechoslovakia 


SUMMARY. Tho expected values and variances of the number of microorganism in cultivation 
basins and the formulae both for the steady as well as for the transient state are given. The condition 
for the culture to stabilise on the non-zero expected values has also been calculated. 


1. INTRODUCTION 


The purpose of this paper is to give a stochastic model of the development 
of a population of microorganisms when cultured in several subsequently joined basins 
through which the cultivation fluid flows. We assume that the external conditions 
(e.g. the temperature, the contents of nourishing substances and salts) are kept constant, 
in particular independent of time, for all the basins. However, one can also assume 
other conditions, in particular time-dependent conditions. In this paper we study 
population in question starting from a given initial number 
asin till the stable (steady, stationary) case, taking into 
which the stable state can be reached. 


the development of the 
of microorganisms in the first bi 
consideration conditions under 

The basic idea is to study the development of the population as a multiple- 
parameter, complex branching process of birth, aging, SË ranion and death. Under 
the notion birth sve understand the division into two or four (multiple birth) new 


organisms. This happens with two different known probabilities, of course, only after 


the organism becomes mature. 
i i “pr in such a wa; era a na 
'The problem of maturing 1s expressed in such a way that we distinguish in 


the whole population, in all basins, between two groupe Si microorganisms, immature 
and mature ones, and the variation of the number of individuals in each of these two 
N, is studied. The number of such groups need not be limited by two. 
i ater number of groups the computation of higher order 

Introducing two groups, immature and mature, the process 
age-dependent branching stochastic process. It may be 
ependence in the process is expressed by means of 
a sufficient way. Some conclusions can be 


groups Nj, 
However, assuming a gre 
moments becomes tedious. 
in question becomes an 


doubtful, whether the age-d 8 
en two groups m 
drawn when taking into consideration results obtained by Bellman, Harris and 


Bharucha-Reid (1960). E 
The equation for the first moment N(t) of the one-dimensional, age-dependent 
branching process reads : 


FO = 1— 4K [3 (t—1) g(r) dr ay i 
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where Gt) is the distribution function of generation times, so that 
gat = $ qar 
dt 


is the probability that a single individual born in time fis subject to generation changes 
during the time interval (t, t--di). A generation change means the division of a single 
individual into » individuals with probabilities g”, where X q” = 1, so that 

n 


K= ng wa (12) 


is the average number of new organisins into which the individual divides. 


A well-known useful tool for solving (1.1) is the Laplace integral. 


L— i 
Thus Nt) = ee (1.3) 
1—KG(p) 
where ii 


Gp) = p I em Godt. 


By means of (1.1) or (1.3) we can study the variation of the expected value 
N(t) with varying distribution function G(t) of generation times, 
seem to be very appropriate are o 


The functions which 
f the type 


Gt) =a f 7 


I GJ e dt (1.4) 
which lead to HU, st 
p) Grae w (1.5) 


The function g(t) takes in this case its external value fort = =, Let us limit our- 
Selves for a ile TE bi 
OT a while to the case n — 1 (exponential distribution of generation times). 
We have by (1.3) Ni) oN D 
=—2 “ (L6 
p—aK—l) jë 


an oe N š 
d hence NG) = čin, 


is positive, whenever X > 1 
with time. 
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e Now, let us compare this case with another one, when the generation change 
akes place always suddenly at a fixed age fy and cannot take place at other ages 


Then Ĝip) = e (1.8) 
and therefi N mer Pt 
an È Ni) = SS — ho Rea 

ab ms isKe NEE °H) ve (1.9) 


The curve (1.9) is of “exponential” character, or, more precisely, oscillates along an 


Hence it follows, that even so trivial function Gp) as given by 


exponential curve. 
viour of F(t), provided we do not take into 


(1.8) has no essential influence on beha 


consideration short-term fluctuations. 


For an arbitrary integer 2 we have by (1.3) and (1.5) 


w M9 ERE 
(p-+a)"—Ka" 
Let us put p+a = 1 
mq” 


then we get (b) v(t) = 
f- 


k t 
and also (e) MOQ) = vie "+a f y(r) ece” dr. 
Determining the function v(t) by (1-5) we determine at the same time, the function 
Ni). “Dhe function v(t) can be determined e.g. by the partial fraction expansion method 

1 gra n qt 
(da) x) = al NË E K—l > që ) 
K n kal 


of (b). Clearly, we have 


as Qn opts (ATE 1 
where gra VE [ SO ay Gale ae ji 
a A(t) for an arbitrary integer n. Since 


s the functioi 
2, the roots of qk are always complex, N() 


Now, we can expres 
=landrn= 


with the only exception ofn 
is of damped-vave shape. 
The probability density function of generation times reads in this case 
ani 
g(t) = (n—1)! 


—at 


a 


5 n—l q 
of g(t) is reached for = we can expect that V(t) is of osci- 
if the division takes 


al 


Since the maximum 
place at old age only. 


llation character mainly 
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Now, let us give the formulae for the functions g(t) and v(t) for n = 2, 3, 4. 
g(t) = art gjet 


N= 2 1 gë 
V(t) = > [1+(K—1) cos at VET 
K 
= GF ia 
g(t) = AT 
p YK IE 33 
= atl K —UËN aly/33/K 
v(t) = a [1+ a (e oer: cos ee )] 
E 3 5 
r ai . 
glt) = ST“ pë 
n=4 


| K— 1 a Ap ina a > 

v(t) = K [1+ —— ( cos at YK-+ cos at YK ) g 

The damped-wave character of oscillations for» > 3isevident. The frequency 

of oscillations increases with increasing Ka”. After v 
F(t) is given by (c). 

According to the values of particular parameters in question 

can take place. Thus, e.g. for n = 2 the function V(t) 

increasing ¢ whenever a > K and it inc 


In what follows we assume that after reaching the group 2 the probability 
of a generation change is the same for all individuals belonging to this group, be it the 
case of a single or a multiple birth. Thus, the process makes a branch. In general, 
different branches correspond to different probabilities, 


(£) is determined, the function 


, different cases 
decreases exponentially with 
reases otherwise, 


2, CULTURE “FOURTERMINAL” 
The culture fourterminal (see Fig. 1 on p. 63) is a basin filled with a fluid, in which 
Some microorganisms live; these are according to some criterion, e.g. according to their 
age, divided into two groups. Let the number of individuals in each group be depen- 
dent on time £ and equal to N i(4) and N5(t), respectively, Microorganisms migrate 
to the basin from the left with probability parameters equal to Ni(t) and N3(t) 


assuming that the Probability that a single individual will migrate from the left to 
the basin in the interval (t, t+-At) is equal to 


À NDA o(A) for type (1) 
NAOD'At+o(At) for type (2) 
where ikë 8 
m o(At) = 0 for At>0, 


The microorganisms migrate from 
the probability that a single individu 
the interval (¢, t+At) is equal to : 


ND” At+ os) for type (1) 
NAD’ At+-o(A1) for type (2). 
42 


the basin to 


the right in such a way, that 
al will migrate fy 


om the basin to the right in 
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Organisms of both the groups can also die. The probability that a single individual 


will die in the interval (f, ¢--At) is equal to 
Nyltjë, At+-o(At) for type (1) 
N,(t)d,At+-0(At) for type (2). 
Organisms of the first group mature during their stay in the basin and thus pass to the 


second group. The probability, that a single individual passes from the first group 


to the second group in the interval (t, tj-At) is equal to 
N (Oa aAt+ oA). 
The generation change takes place only in the second group. The probability 


that a single individual of the second group divides into two new individuals of the 


first group in the interval (t, t+At) is equal to 

N Q(t) ta, .At-+0(At) 
and in a similar way the probability that a multiple birth takes place in the interval 
(t, t--At) is equal to 

N4ft)flo,aAt--o(At). 

ameters for entrance to and exit from the basin D’, D’ 
a different choice of the diameter and the pla- 
changing the speed of the fluid. In 


The particular probability par 
can be influenced mechanically, i.e. by 
cing of the pipes under consideration, and also by 
the first approximation we can write 
$e 
EF 


where v is the velocity in tho connection pipes, f the area of the orifice, and V the 


volume of the basin. 

On the contrary ôv Say Hoja Haar Hae Can be influenced mainly by a proper 
choice of external conditions (the temperature, the contents of nourishing substances 
in the fluid, the amount of the absorbed light, ete.). Now, our task is to write down 
the fundamental equation for PW; N 2 Nj, Ng t), i.e. for the maa probability that 
in the moment ¢ the number of organisms of both sha types in the basin pë question is 
equal to Nj(t) and W(t) while the number of organisms of bath the types in the prece- 

Having started from the above 


ding basins is equal to Ni(t) and Not), respectively. 
mentioned elementary probabilities and proceeding in a usual way, we get the equation 


2 PU, Në, Nu No! 
= [PN Në, Miti, Ne (Ny )—P: Në Nv Ny ANJ, 
HPW, Ng Np Math Na+) -PN Në, Ny Na 4 Ja 
PWN +. Ng N,-1, Ng gi PO Na Nj, N ns past uj 
IPW, Ni+1 Nu Ne} NHD, Në My No OND 
HPN No N,+1, No NN +1) -PW No, Ny No, PN D" 
LIPON, N., Ny Neth N+) -PAn No Ny Noa, t)N:)D 
ENË, 2 Nat, (Na+) -PN No, Ni, Na, t)N allze 


7’ Në, N —2, ; 

ae pë N4 Nat L Na+) PW Ne, Ny Ng YN alhos 

HIP (Në, nË MF, Ny): IN, --)—PN: Ne, No Na ON) w (21) 
r si 
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From this equation we determine first and second order moments of particular random 


variables, i.e. Nj(t) and Ng) or W(t) and Ñ. a(t), and all covariances of interest, that 
characterise the population in the basin. To find an explicit solution of equation 
(2.1) isnot easy. However, the moments can be easily determined (see Feller, 1958, 
p. 411). For first moments we get equations 


pa Me OO HO ne (2.20) 
end x Nalt) =F (tje a— Nalt)(62-H D" 4Ha a+ Hoa) +N a(t) D' ni (2.2b) 


We apply Laplace-Wagner integral to both the equations, setting Y,(t) = Fio and 
Nt) = Noo fort — 0. We get 

eye eet | i N (p)! 7 
| Nilo) | 1 | P+ +D" +4,2 — (aat 2jto 4) 19) | p Nio (2.3) 
OY D' Mya: PH’ +D" +4 t toa Ño) a Noo 
which is the fundamental equation for calculation of expected values N,(t) and Y,(t) 
of the cultivation fourterminal. 
(2.3) by p. 


From equation (2.3) we can derive some interesting results for one-side-isolated 


cultivation fourterminal, which is defined by Ni(t) = 0 and N;(t) = 0 for allt. Then 
we have 


The Laplace operator was denoted in equation 


(2.4) 


Nj 
P+S8s+D" + Hoot Hoss Arat 2fty,4) | 50) 


sol “I fm PEED Eg 


where A is the determinant of the square matrix occurring in (2.3). Evidently, the 
influence of D’ disappears. Determinant A is a quadratic function in p. A 
plying we get 
A= (P48: + D"+ tty aN pt dyt Dt fat bea) — 2i al tat taa). 
Let us ask when the limits of (2.4) for p0 exist 
p—0 corresponds to the limit for too). 
Evidently, this ‘case takes place if 


(81 +D" + ura) +D" -Ha a-ti 
Namely, we have then 


fter multi- 


and do not equal 0 (the limit for 
(Tauber’s Theorem; Pol-Bremmer (1950)), 


24) — Hral Hoot kaa) = 0. < (2.5) 


A= PPT E+ 2D"+ atta p9- ita.) 
so that we can reduce the powers of p in (2.4) 


10 and then we can set pP = 0 and then 
determine the limit values Ñ (t) and Nxt) 


Hence, equation (2.5) represents the condition tha: 
fourterminal the non-zero stable state can 


for first order moments of a general culture 


t in an isolated culture 
be reached. From the expression (2.3) 


fourterminal, we can easily derive expres- 
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sions for specië termi T X Wi 
r special fourterminals. Thus, for ey ample, for the reception twoterminal 
3 


i.e. for the reservoir defined by 


ô, = 0, D" = 0, Ja = % Mag = 0, Hoa = 0 do = 0, Njo = 0, Noo = 0, 


we s Dis z Does 

7 fin =Z ies Mo 5 

3 poe 

so that we have’ M(t) = f Ngrjar. - E) 

er cases When the particular probability 

Also, for second order moments of the fourterminal 
However, the derivation is more tedious, 

ts of NiO, Na, Wi, Ndë, but also 

Ni Ni), cov (Ne, Ni), cov (Ne, NAY 


In a simi 2 a 
1 a similar way we can proceed even m oth 


parameters have special values. 
the necessary expressions can be derived. 
second order momen 
Ng Na cov 
Iculation. 


because not only 
corresponding covariances cov ( 
cov (N4, NJ), must be used for ca 
ave to deal with muti 


as for firs 
differential equ 


ual relations of seven quantities. 


Altogether we h 
Under the same assumption 
equation (2.1) the following simultaneous 


t order moments we can derive from 
ations for second order 


moments, 

(a) 3 Ñ=- oN. fë Dt) Mist 23) ov (Ny, Na) 2D' cov (Nj, Ni) 
4 F+ D' +a) + Sallat saa) + D'N; 

(b) a NE Ñt D+ jaata) t 20,9 cov (Ni N,)-+2D' cov (Nas Ni) 


D" +e t pu) Nat DIN 


+N (bet 
(c) È cov (N,, N) = 7 sort N nb Bet aD tatanta HD eor 
+D' cov (Ni, N.: tia. 1-28, allta t 2plo,4) 


NH 2 Nglits,at Zji2,a) 


i i i D'+D' +H.) 
(d) at cov (Ni, N) ——GOV (Ni N (ô+ + ja 
e, ot 212.4) cov (Ni, N)+D'Ñi—D'Ni 


6 - cov (Ni, Ne (8+ D' +D" haat Hes) 


) = — cov Ni No 


4D’ cov (Ni; Na) +a, COY (Ny, Ni) 
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(£) Z cov (Nj, Ng) =—cov (Nj, Nið, + DA D'A-je, 9) 


+D' cov (Nj, N:)--2 cov (Va, Na) to 2-22.) 
ð , 
(g) ay Y We, Na)— —cov (Ni, Nae +D' +D" +4 o-ta) 


+4, cov (N,, Ng)-D'N:—D'N:. ne (2.7) 


These relatively complicated equations are again transformed by means of Laplace 
integral. Without any loss of generality we can suppose that for t = 0 all second order 
moments equal 0. In such a way further calculations simplify a little and we can 
immediately write a matrix relation, which resembles relation (2.3), namely 


Ñ, N; 
Ñ, F, 
cov (N,, No) | cov (Ni, Na) | 
| cov (Nj, Ny) = B-p) | Ñ; | own 8) 
cov (Ni, No) N; | 
cov (No, N3) | Ni | 
cov (Ng, No) | No | 


where both B(p) and L are square matrices 7 by 7. We have B(p) = 


prë HD” O Ata —2D' 0 9 
+a) +2floa) . 
0 PH +D"  —2ty 4 0 8 $ Doni 
Faatoa) i 
—h,a (26 p+d,+05 9 —D = 2 
F 2o a) FE 2D" +My, 
Fot Hoa 
0 0 0 PARAD —Augot2y) 0 0 
+D "Hkr, 
0 0 0 — Hia p+d,+D'+D" 
FHH Hoa 0 0 
0 0 0 0 


0 prëtD Ama 
+D" +a + 2,4) 


—44,2 P+ô+D' 
+D” “ita, 
Hza 
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D' 0 
0 Poy 8 A i E ae Hitet 42.) 
0 0 0 0 fia MTD bia tian 
pasl 7 Tha Tat 22.) 
o pe i 4 ; 
0 0 0 0 D' 0 3 
i x Q UR SW 0 0 
0 —D' 0 D' 0 0 å 


The matri 
matrix Z consists mostly of zeros. 


enables us to compute all second order moments of the popu 


Equation (2.8) 
ments of the immigrating population 


en bo basin, provided we know both mo 
rder moments of the population in the basin. ‘The calculations are cumber- 
The stable state can be considered in a similar way as it was 

The determinant of the matrix B(p) 
On the contrary, L does not depend on 
provided such limits exist, can be 
tation of such limit values is 


il however, possible. 
i ira Hie p i moments case. 
derived when limiting (2 re bee E TE 
relatively easy. e e ak aks 
3. CLOSED CHAIN OF CULTURE FOURTERMINALS 
nuri ha vin of continuity we jom single culture fourterminals into a conti- 
vlen Fh ao ie section pj devoted to the derivation of expected 
state de _ va N. a(t) in particular links of the chain. These values are in an unstable 
vë RË ident on time. In an equilibrium state th In what follows 

ourselves to the first order moments, because d order 


Mon RË 4 
lents is essentially similar and, of course, Very tedious. 


ey are stable. 
the calculation of secon 


2 on p. 63. It consists of a set of basins 
ed microorganisms. Let us number 
We shall also distinguish the values 
that Nj, and N are the instanta- 
the second group of the first basin, 
in general we write Nji 
esponding moments, 
i= 0 the fluid 
t Nj micro- 


ketchedin Fig. 


Let us assume a chain as $ 
the cultur 


oo which the fluid flows and carries 
N, na Fa mi mi “a ni 1, By avers 
heous nu 2 by corresponding indios in udna pas 
spent umbers of microorganisms in the first an ad 
and se 3 In a similar way We proceed in oe asins, 8 
2i In an analogous Way we distinguish between corr 

Let us assume that at the time instant 
microorganisms. At the same time instan 
Noo microorganisms of the second group are supplied 
ess of cultivation starts. TË the conditions for 

this state is really reached after some time 


‘id Kage My and Noy 
Organis 1e basins contains nO 
into ree of the first group and 
he first basin only and the proce 


reach; x 
Caching the stable state are fulfilled, then 
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(theoretically after an infinite time). By (2.3) we get for the i-th basin the following 
abbreviated expression 

Mo amo e Bo | a (81) 
Sao Po Di | (0) | 
Clearly we have also 


Ni (9) = Ni (9): Miser) = Mas (1) ve (3.la) 
Nga (P) = No (P)5 Mesa. ©) = Me, (0) I) 
DID sa (0) 


According to the assumptions stated above, we have Nj. (0) = 0, Ny, (0) = 0 for all 
i except i= 1. Thus we get (omitting the operator p) 


Rial ami P Naio | 
Nja | Di Ne HI Neo 
pe Mus fl 
i| Nia | _ a, || Nja | 
|e =a | 
| Non | Dë | Aaa 
i a] 
and finally | Biya | dy | Sal 
Nani || Da | Van! 


| | as, (8:2) 
| 
Since by the assumption Ng = Nj, and also Ne = Nja we can finally write 
I Ajla +r My = | Nin j= P | Nio 
Di Ds...D', Nan | Dy | Moo 
where j is the unit matrix of the second order. 


to the matrix a,. Namely, the n-th fourtermin 
Fig. 3 on p. 63) : either to the reserv 


(3.3) 


A special attention should be directed 
al has two kinds of immigration (see 
oir or to the first fourterminal (the feed-back), 

If we denote by D 
voir and by Dj, 
so that 


an the probability parameter of immigration to the reser- 
the probability parameter of immigration to the first fourterminal, 


D, = Din +D5n (3.4) 
then the matrix a, of the n-th fourterminal will contain the whole probability para- 


meter D;. By (2.6) growth of the number of microorganisms in the reservoir is then 
expressed by 


Nalt Nol) = Din (Dr atr)-4-N rr, (3.5) 
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Evidently, we have then 
Dyn = Di. ne (8.6) 


e conditions in all the subsequent basins are the same, then also all the matri 
rices 


a; equal each other and (3.3) simplifies to 


a” r 
DiiaDs * 


us to compute the operator transforms ®,,,(p) and Ne n(p) 


Non I Di | Noo | > ý 


m 
The equation (3.3) enables 
of the ex v Pa 
the expected values Vj; y(t) and Ve,n(t), respectively. 
m i 

om 6 Thus, being able to compute these values we can also compute from equation 
E 5) the growth of the number of microorganisms in the reservoir. As to the 
aes order moments, the whole problem of continual cultivation is thus solved. Ina 
similar E 

similar way we can proceed when computing second order moments using equation 
(2.8). However; computations are cumbersome and will be omitted in this paper. 
i On the contrary, we will consider the problem of determining such conditions 
or the closed chain of culture fourterminals that the process not only can reach the 


stable state but also stay in it. 


THE STABLE STATE IN THE CULTURE SYSTEM 


a similar way as in the case of 
t) to stabilize 


4. CONDITIONS FOR REACHING 
These conditions follow from equ 


one basin (see (2.2) and (2.3))- For the expecte 
on some non-zero values it is necessary and sufficient th 
e., of the matrix 


square matrix of the second order, i. 


Alg +++ Un 2s Ne. 41 
ane, i| (ED 


similarly as in the case of a single 
hich is the condition for N (¢) 


ation (2.2) in 
d values Ny,,(¢) and Noal 
at the determinant of the final 


has the value zero for p—0. Then it is possible, 
basin (see (2.4)), to reduce the power of the operator p, W 


and Yt) to stabilize on non-zero values. 
evident from equations (3.1), (3.2) 


ation procedure is 
stability condition and to determine 


rive the 
e closed chain and 


The general comput 
t) in all basins forming th 


and (3.3). Now, the question is to de 
the stable values lim Vi(4) and lim Nol 
` tn ton i 
finally to determine regularity conditions that govern the culture during the transient 
ate is reached. 


period before the stable sti 
end of the chain, i.e., the values V,,,(t) 


values at the 
nables us to compute corres- 


For determining average 
and V,,,(é), we have at our disposal equation (3.3) which e 
' A A, 
i Å E T, 
ponding Laplace transforms N 1alp) and Nen(P): 
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The first step is to compute the matrix product aja. ... 4, Which contains the 
operator p. In order to get simple results let us introduce the following notation : 
(dp, b; 
a = || js i (4.2) 
të, dp 
where t; b,, c; and d; are constants dependent on conditions in the i-th basin (cf.(2.2)). 
Let us calculate the eigenvalues (A;),,. of the matrix a; so that we set 


I &—(A;—p), b; I 

I St 

| & =, da) 
Hence (A—p)?—A;— PNG +4) +a,4,—82; = 0 ve (4.8) 
BO that (Apia = p+} (ad) VA (GH dp) (ajd,— bë). s (4.4) 


Evidently the eigenvalues of the matrix a; are linear functions of the operator p. 


Let us now compute the product v(p) of the matrices 


A U(p) = dy dg... An mi (4.5) 
where we get by (4.2) : 
Á id bi 1, 0 
a = | _ |+p | = hoa HP]. ni (4.5a) 
lë @ | 0, 


Then we get 
o(p) = (40,1 “HpIXo,a +P) --- (don PJ) 


do,1đ0,2 +. 
= doj Goja“: don px OE ro z on 
yk 


2 5 g Got 02+ Fon L. p, w Uo 
+p fe ët Pi (4.6) 


; do) Mo,9 «A Goa doy. <t 
Let us remark that the expressions as 02: 0n qy COLMO2 0 are 
doj 9,5 Qo,k 


symbols only and by no means denote division by m 
product ao, aok» 
only. 


atrices a,j; or by the matrix 
These symbols are introduced for the sake of simplicity of writing 


Let us rewrite (4.6) in the form 


PP) = Koo +pKo 1--pëkoja +... +p} ven EN) 


are clear from comparison with (4.6). 
The determinant of the matrix v(p) is for p> 0 


where the symbols Ky: 


equal to 
10(0) = Jaor doe ... don | = | Koo | ws (8) 
and can be computed as the product of dete 


2 rminants of tices i i 
Let us introduce the ‘iotation matrices in question. 


(4.9) 
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Then we can write instead of (4.7) or (4.5) the matrix 
| ty ot ptr tet ta opt 1 +P t2- | 


op) =| 
| ago pre KRE etH Bao tps 1 +P sa | 


Clearly we have 


(4.10) 


Now let us determine the determinant of the matrix (4.1). 
rdg -An j = PË Pa Le D 
DD D, DD; . D, TA {Di DJ 


Hence the determinant in question equals : 
|æ otp, en DIDR--: Din dëgjo pra,1-6 | 


go PES... Di Dë... Dn 
GA 


Alp) = Je(p)— DiD: ... Dn |= 
æg o tPts re 


With respect to the further need we will compute the coefficients at the zero-th and 
the first power of p only. We have 
vë Dito PD: ... Di) — eo %,0 


A(p) = (4, 9— Di Da š 
Di) +eya(tso—DiDi --: Dy) 


pits a(ts0—DiD2 : 


— tga tgo Tsn Polo: (4.12) 
Evidently for p = 0 we have 
A(0) = (24,9 —Di Do eae Diso DP ... Di) —%o,0%3,0 
5 (DiD: pe DË—DiDe e Dili ot tao) Lo dë Yao Va,0 
(4.18) 


SAD e(o)--Je(o)1- 
matrix 0(0), ie. %,0%s,0 
Let us further denote 


— (Di, Dy... D,—DID: - 
tr v(0) denotes the trace of the 
t of this matrix according to (4.8). 
ower of p in the equation (4.12) by 
D1)+% (50-1 Dy... Dn) 

wae (4:14) 


Tn the expression (4.13) 
and |v(0)| the determinan 
the coefficient at the first p 
P(0) = z(t P1 Da --: 

—M%q,1 T30 — %3,1 2,0 

that the determinant 
tr v(0)+ | (0) | = 0. 
stabilize. This condition 
qual, i.e. when conditions 
s the n-th power of the 
values Aj, Ag 


The condition of the stable state is (4.13) equals zero, 
(Di Da + D} —Di DhcaDa (4.15) 
for the culture process to 
in the product (4.5) are e 
duet (4.5) then become: 
fies by introducing the eigen 
dex should be deleted). 


Ag, the determinant Ja"| = MA 


thus 


This is necessary and sufficient 
simplifies when all the matrices 4; 
in all n basins are the same. The pro 
matrix a and the stability condition simpli 
of this matrix (see (4.2) and (4.4), where the in 
The eigenvalues of the matrix a” are Aj, 

Therefore (4.15) becomes 


and the trace tr a” = A+A. 
(Di Da ++ Dy— Dj De = Dh (ARARA Az = 0 


e in the form 


which we can writ 
(Dt Dj -Damai 


(D; De ... DM) = 0 
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so that finally either 
DE Dë, DË AP (4.168) 
or DI De DË = At: i  (4.16b) 
Thus, in the case of equal basins the stability condition is especially simple. Conditions 
(4.15) or (4.16a) and (4.16b) can be in the easiest way fulfilled so that we choose Di 
in a proper way. Let us remark that other parameters D3, D3, ..., D} are not free, 


they are dependent on Di, Ds, ... (see (3.1c)) and they participate in matrices a;. 
Further, for the solution to be practically realizable we must have 


ap <D. ni (4.160) 


Therefore we always choose from two possible values, the value that ensures (4.160). . 


5. COMPUTATION OF STABLE STATES 


We shall start from equation (3.3). Employing (4.5) we can write instead of 
(3.3) 


|) ty 
Nin 


[(p)—D; Do... Dy J). 


|| = Dy Dë ... Pais I cës. (Bill) 
Von | 42,0 || 


To calculate Ñ in and Non we need the determinant of the final matrix given by the 
sum of matrices in the braces on the left-hand side of this equation. However, 
according to (4.11) that equals determinant A. Let us denote z — D, Dy... D}, and also 


lu—z Ha 
‘As | (5.2) 
Ha Kaz 
Then (5.1) becomes 
jue m | | Mam | 20 | Mi | 
Hg Haz I INenll Di | Va I 
and therefore Nja = DA [Niola —z)—N s 0 He] (5.3a) 
o 2p = = 
Non = DA UR) tg Naolie—z)). (5.3b) 


Determinant A is given by equation (5.2) and i 
a gi ) E) and is the same as (4.12): After 
fulfilling the stability condition (see (4.15)) this determinant has its absolute sa equal 
to 0, i.e. lim A(p) = 0. Therefore the ratio g can he cancelled by p 


pen and thus the degree 
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of the polynomial (4.12) decreases by one (the absolute term of this polynomial is, 
after the stability condition is satisfied, equal to zero 1). Thus, we get the polynomial 


A 
2 —Plo): 
j (2) 


By Tauber's theorem we get from (5.3a and 5.3b) the following expressions for limit 


values lim N La(t), lim Ñ. „(t) which will be further denoted by Ni and Nanë 
yo i 


N, m= DTO I Latest E)—N 2,0 /42,0] (5.4a) 
1 

7o = 2 [Nott Vendi) (5.4b) 
N 2,n,s DPO) [ 1,0 /°3,0 3 

In these equations 4,0 {4,0 Mao: #40 are values of /44, Ho KHa ls pee = tË ae 
by (4.11) /ty.9 = tao Hgo = Tao Hao = “eo: Fao = z,o and P(0) is given by equation 
(4.14). i 
for the computation of stable values Vin, 


Thus ave everything prepared a 
are iy Jification in the case of equal basins. To 


and Ñ, „e. Tt remains to consider the simp! 
do this it is sufficient to determine the interpretation of the symbols jë, Pa Ma» Hy 


5 9 i id, determinant 
and especially of the determinant A (ef. (5.1) and (5.2)). i pa ? > gemi 
(p) is the determinant of the n-th power of the basic matrix, i.e. 


pa” cal xpressed in eigenvalues Aj, Ag 
According to Sylverster's theorem the power a” can be exp 


ee | i rix a, so that 
and in the elements d-rp, 5, ë, d+P of the basic marda a, 8 


jaro an N ve (5.5) 
n = | I 
pë e 
apap g NË, 
where ay = Ars yap Bn À — Ae 


i i n Ag. Therefore 
Tt can be easily verified that matrix a” të e A ences 
determinant A, corresponding to matrix a”"—% 18 a 
| é+p—%—* 
An = Bn é, d+p—%n—* | 
= By Ia = Bld+-p—%m). - The cor- 


| ve (5.8) 


= bpm 
Hence, we have /4 = Bala pan), H2 Ên: bs 


responding va for p = 0 are VË (8.78 
‘ponding values for p PER S anan i de 
(2,0 = Bro 5 i m Te) 
lsa = fno ? (5.74) 


f4,0 = Bro (d—an,0) 


pa and therefore denoted by &n,o and Pro. 
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“Finally, it remains to determine P(0). In accordance with (4.6), for equal 


basins wé have o(p) = (a+) 
and therefore v(p)—zj = a-—npagi-...—zj. 
Since a = lim a we have aj = lim a” and similarly ari = lim a. Then, we have 
> po po p—0 
JA, = ab-+-npaj-t+...—zj| ê sox (68) 
| Sno; b | 
where ai = Bro i | (5.9) 
| d—Gno | 
-1 Anton b | 
aj = Bna0 | ... (5.10) 
| CG s don 


Although the coefficient at the linear term of determinant A, can be easily computed, 
the general solution involves complicated expressions. Therefore we shall not deal 
with the general solution any further and will later give a special method for obtaining 
the solution in the case of a chain composed from equal basins. 

Remark : More lucid results can be obtained when the eigenvalues of the 
matrix a”—sj are used. These eigenvalues are given by A, = At—z and A, = Mi—z, 
where A, and A, are the eigenvalues of matrix a (ef. (4.4). 
Consequently, determinant A, of matrix a”—zj is given by 


A, = A,A; = (Ar—z)Y(Az—z) i (5,11) 
Under the stability condition, 
we have lim A, = 0 
p—0 


because (Ako —2)(AZo—z) = 0 (ef.(4.16a) and (4.16b)). 


Assuming that by (4.4) AL = PH M,o, Ag = p-rAs,n 
we have An = Ip--A, ,o)”—z (p-r Ae,o)” —z). 
For determination of the coefficient at the linear term of P(0) it suffices to consider the 
expression 
(AF oHnpAt d —2) (ABo-npat-d—2) 
P(0) = nÀ? o—2)AF GE +-(AP o—z)AR 
= MAILE [Ayo Aso—e( sa tty) |. 


n=1 Tal 
Ato 2,0 


so that 


Finally, we have 


P(0) = NATA À ca al 1 
1,0A2,0 l 1,0+Ag, 0 “ro te ) |. w (6.12) 
Thus P(0) is determined and the result must coincide 


with that obtained pa 
tions (5.8) through (5.10). obtained from equa 


In order to demonstrate the introduced co. 


bck, mputation procedure and to sho 
its numerical feasibility, several examples follow. AE 
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6. NUMERICAL EXAMPLES 
1. The following values are given for cultivation in a single basin furnished 
with the circulation of the medium (Fig. 4 on p. 63) and with the take-off of the yield 


per day per day 
Noa = + ô = 0, 4 
Hea = 2 da = 


masi D" = 3%(D; = 2, D, = D' = 1). 
By (2.1) we can easily verify that all parameters have dimension të. In our case the 


unity is 1 day. The values were chosen in such a way that the stability condition 


(2.5) is satisfied. Our problem is to study transient phenomena and stable values 
for the average number of microorganisms in the basin and to compute the yield. 


ar that the fourterminal in question is in fact 


Although it is intuitively cle 
a more general equation 


“isolated” and thus equation (2.4) applies, we shall start from 
(2.3). According to it we have after reaching given values 


NM | i] eee =" [A _ 8 M0 | 
m1 Ij —s ; pti || nel 1) Nol 


where N 1,0 and Ne, are expected values for i= 0. We shall deal with them for a 
while as with general numbers. ' Clearly Ñ = Ñ, and Ñ, = Ng: thus we have 
i pë, 4; —16 LII N, | an Mio | 
Na = 
|} —4 3 p10 ll Na A 
A) equals 0 for p = 0; this proves that the 


(4) 


'The determinant of the square matrix in ( 
culture stability condition is fulfilled. 
For Ñ | and Ñ a we get ( 


mo Aa [p+ 10) F, o+ 16Na] 


A) from 


5 1 = z 
= 4N, o +(P+6:4)N2,0] 
p164 [4N otte 
er transforms of expected values Ñ (t) and Nat) 
organisms in dependence on time. 
theorem the values of 


Wagn 
ied types of micro 
p>0 we get by Tauber’s 


These functions are Laplace- 
of the number of both stud 
Passing to limits in (B) and (C) for 


N (t) and Nat) for t—00. Evidently 

D) M=, GLN, o+0,975N2,0 

(B) Na” 0,224, 0+0,39N2,0 
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Clearly the asymptotic values Nj, and No, are linear combinations of the initial values 


Nis 


Nı,o and No o, the ratio being equal to 2,5. A particular case is if 


Nos 
Nj = No 5 Nga = Nov. 


Then also the initial values must satisfy the relation 


(F) M0 = 2,5¥2,0 
It Njo < 2,5 Neo then Bi, > Bi; Nos S Neo 
and if Njo > 2,5 Noo then Nj, < Nios Nos > Nao (see Fig. 5 on p. 63) 


Now let us find the inverse images of (B) and (C). 
We get (G) Nj) = 0,61 4 0+ 0,975N 04 (0,39N,0—0,975N, o)! 
(H) Nalt) = 0,244 N, 90,39 3 0+(—0,244N, 0+0,61Np0)0 16-1, 
From these relations it follows that the culture process roughly stabilizes after the 
period 


3 1 
= —— = 0,186 ~ — of day. 
to T6, 1 0,186 pa ay 


Further, if Nio = 2,5N then the culture process is stabilized immediately from the 
instant ¿ = 0, i.e. no transient phenomena occur. Finally, by (3.5), the number of 
microorganisms in the yield reservoir is increasing and given by 


(K) FiO +Wolt) = Fron, alt) = De f Nar) Nardi 


= (1,708 Nj g-2,73.V29)1--(0, 0187, o—0,0454N p o)(1— e7161), 


EN 10 = 2,5N ao, the yield in the reservoir will increase linearly in time from the very 
beginning. Now, let us only consider the yield in the stable state. Assuming, we 
know Njo Ngo = Noo the slope of the growth of microorganisms in the. yield 
reservoir is 


1,708 Ny,9+2,73N 5, = 2,73Ng9— 1,022 Ñ; 0: 


Evidently, this slope and therefore the yield also is maximal if Ñ 10 = 0. This means, 
that at the beginning only mature microorganisms of the second type should be inset. 
In this model the magnitude of initial values does influence the yield. The yield is 
greater if the initial amount No» of microorganisms is greater. Assuming the same 
concentration it results in larger basin. For maximal output we must have 
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Noo = Na; Njo = 0, so that in the stable state the amount of microorganisms in 


the yield reservoir increases according to the law 
Nror,r = 2,73 Not 


the ratio Vis in the yield reservoir being equal to 0,975/0,39 = 2,5. Thus, immature 


Ng 
organisms prevail against mature ones. 
2. As the second example we shall give the computation of continual 
culture of microorganisms in a closed system composed from three basins, in all of 


them the generation parameters and conditions are the same. The problem is to 


determine Dy so that the stability condition is fulfilled, to compute stable values 
ns, and to determine the yield. Finally, we will compare this 


Nja, No, in all basir 3 
në assuming a single basin and discuss the results obtained. 


case with the subsequent one 


Concerning the parameters we assume In all three basins : 


Meg E P=? ash 

ô = 0,45 6, = 2. 
Further, we have : . 
k Di = Dy = D= 3 


The basic square matrix a reads 


| 
(6) esi —4; prill 


for p = 0 follow from the equation 


AZ —18,4A+17,4 = 0 


17,4 
(b) (Ap), = { 1,0 


tability condition follows from equation 


Tts characteristic values (Ao)12 


and therefore 


; issi the s 
Since only the root Aso 15 admissible, 


(4.16b) and we have wth si 
i 9D; =1 

i.e. 

so that paj E (Di < Dy = 3; Dis= d a | 

(If the root A and therefore equation (4,162) is considered, then Di > DË, what 
leor our requirements). j 
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Now, we shall turn our attention to the calculation of the stable state. 


The stable amounts of microorganisms in the third basin are given by the 
formulae (5.6) and (5.7a-5.7d). First, we determine necessary parameters. 
We have 


(d) z= Di D; Dj =1. 
By (5.7a,b,c,d) we get 


A— 
(6) Ajo = (17, 4—1) Tr Ü nË pë 


, 


—16 
f = (17, 8—1) —16 ___ 5) 
(1). Loo = ( ) 164 5120 


—d 
(g) o = (17,48—1) —“ = —1280 
g) Hao = ( ) 16.4 


5 


ll 17,4 
(b) Hao = (17,451) —— (17,421) 174 _ 320; 
Hao = ( ) 16,4 (17,42—1) 16.4 3203 


and by (5.12), (i)  P(0) = 3.17,42 (18,4— na i 


ie. (j)  P(0) = 15780. 
Hence, we can write 


— 9 — — 
Niss = 15780 [N1 o(3203— 1)+5120 Noo] 
— 9 — —— 
Nose = 15760 11280 NiotNao(2048—1). 
Thus (k) Ni E 1,83 0+ 2,9200 


W Nass = 0,73N,g-1,17N 5. 
Substituting similarly as in the case of a single basin 
Ny +N oo = Noo ie. Nig = Noo— Ny, into (k) and (1), 


ve get (m) Mia, = 2,92 Noo —1,09 No 


(n) Nga E 1,17 Ngo—0)44 ¥,, 


and we have N 1,3,8 
= > 2,6. 
N, 2,3,a 

It is again evident, that the 


greatest yield is obtai if N 
The yield increases in the ehli 


= 7 E RË 
stable state linearly w= 9 and Nag = Map 


according to the law 


(p) Ngor,n = Di(2,56 Ni 9+4,09N, o)t. 
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Unde = 
T optimal conditions, ie. if Noo Noo and if TE — = 2, 889 (ef. (c)) ve h: 
, we have 
(q) Nror,a = =11 SN oot. 


Takiri : r 
je into consideration that for a single basin we had N, 2,732 
clude that for three basins and for the same No the veal is. fics abea! n ja ve s 
ute value cca 


4 times g 
greater. Now, let us ci 
ompute the stable ar 
mounts of microor 
ganisms in 


the first and the third basin. 


Since 
| = I — JË 
Niza | sa Ko | Nise I 
No2, Ds | Nese | 
we have under optimal conditions Nj,o = 0 in the second basin 
[>| Arë 7,45, —16 | | au | 0,963 |) 
= —00 = Np, | 
Na u LITI 0,397) 
For the first basin we get 
| ase E” 7,4: E || 0,963 9:2637 | 
(Base 3 | —a; mil 10,897 "el 0,1717 | 
follows. 


For the stable state and optimal conditions we summarize as 
BASIN NUMBER 
T II TI 


oo o o 


0,2637 0,963 2,92 


Nga) No 
MoielNo0 01717 0,397 1,17 
on C of alive microorganisms in all three basins, the 


Assuming the same concentrati 
total volume Pror of all basin 
4.0,963-+0,2637-+1,17+0,897-+0,1717) = 5,88 Noa 


s in the second example is 


(9) Pron = Ae (2,02 


olume of the single basin is 


65 
(t) V a1 Noo 


yield W1 per hour referred to the unity of the volume, 


In the first example the v 


Computing in both examples the 
d example (3 basins) 


we get for the secon 
ons 1L8 o = 20 
and for the first example (1 basin) 
2,73 
pj SOS OC. 
Wi= 7,365 


the volume is the same here for both cases. 


The exploitation of 
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Example 3: Let usassume cultivation in two basins, provided in both of them 

are different conditions. Let for the first basin (see Fig. 1 on p. 59) gjo = 0; dj, da. 
Mag 0 and, on the contrary, for the second basin zo, = Haa = 0 dj, da, fa A O 
Further, let D, and Dj be given. The problem is whether and when such a system 
can be stable. Let us write the matrices aio and ay, (see (4.5) and (4.5a)). With 
respect to the given values they are of the form 


a, ; —b, {| dy ; 0 
Go = BI Ag, = Së: 
Dj de —ëgjdal 
| & G+ bjës ; —b,d, | 
Clearly, by (4.8) V0) = ay ago SA 
| —djës ; dida 


The trace of the matrix v(0) is therefore Gy@_+b,C,+d,d. and the determinant is 
ajdjaçd,. The stability condition is by (4.13) equal to 

P—a(Gylig+-b,0,+-dydy)+-G,d,aqdy = 0. 
In the stable state this equation should have one root equal to zero. This is possible 
only under gjdjazd, = 0 what contradicts our assumptions. Near the stable state the 
term V@,dyied, must be small compared with 2(GG2+5,0,-+d,d,). Then one root 
from z4, is small and positive and the condition (4.16c) can be satisfied. 


For detailed analysis we must remember that 


a = 06, +D b = 2(Me,ot 2H 4) 4-0 d= da Dj tHo 2t has 
fy = dat Deka by = Ča = Hia dy dat De 
D; = Dt z = DD; = DID: 0< Di < D; 


Then one can see that to fulfil the stability condition is sometimes by no means simple. 


7. CULTURE CHAIN 


Sometimes it is advantageous to couple several equal basins into a chain. The 


cultivation conditions are the same in all the links. 

Though it is possible to compute chains using formulae given in Section 4, 
it seems purposeful, taking into account their specialities, to develop for them a special 
method of computation, To do this, we shall again start from equation (2.3) which holds 


for a single link, where we assume values Nis and Ng for t = 0, Computing the 
matrix products we get the equations 


DN, 4 = aN ,,—bN,,—pN,, (7.La) 


DN, qa = Kita pN (7.1b) 


We denoted 
i= prë Duo b= 2(Ha,a-+ 219.4) 
d= P+ 82+ D" + tg ot Ha a; k-varying index ( 


Tf all the links are the same, then also a, b, c, d, D' are 
numbers of microorganisms at time t= 


oS fins 
see Fig. Ton p. 63). 


the same. If the expected 
0 are the same in all links, then also the initial 


20 are the same for all the links and equations (7.1a) and 
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values Njo and Ñ, (7.1b) 


` jj r À. 
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apply for every link of the chain. Therefore we can assume them as simultaneous 
differential equations describing the phenomena in the chain that had at the begin- 
ning the same concentration of microorganisms of both types in all the links. Simi- 


larly as in previous sections Nj. and Na are Laplace-Wagner 'transforms of the 


which determine expected values of the number of micro- 
e on time. To solve equations (7.1a, b) we 


By (7.1a) we get 


function AA (t) and Nox (t), 
organisms of both types in dependence 
first eliminate one of the unknowns, say Mes 


Non = t (aÑ. D DN Lk —pj,o) 


Ñ gk—1 — + (aN. wea DË. 1e-a PN, 1,0) (7.2) 
we obtain after modification the equation 


and substituting this result into (7.1b) 
t—d)—poN go -- (7.3) 


DOR, aa) Bad bo) DY = pN, (P 


The particular solution of this equation is given by 


7 (D—d)—bN, 
ATË vë emi st — K,= NDO st ___ zo AEE 
N, = const = Ko Dard (ad—be) (DE ) 


ation, corresponding to (7.3), reads 


The solution of the homogeneous equ 
Ñ= ev 


where y satisfies the characteristic equation e 
(p'jre-?—D'(a-+d) + (adel? =0 
ie. the equation (D'ye Dat dje” +ad—be = Va 
Since ad is the trace and ad—be the determinant of the Da tË a, 
values of which we denote by As, As, We can rewrite equation (7.5) into 
( pipet? —Dida tale" 4s =O. 


whi d into the product 
ich can be decomposed 1 ypt ADEA) E 


(7.5) 


the eigen- 


var TH 
(ea ( nË ) 
d equation is therefore given by 


The general solution of the homogenize a 
rS D' jala) 
N. 16 Ky ( AS 


U 


80 that (e) — F 


Aa 
E eet 
and the genera! solution of the equation (7.3) is given DY 
N. 
E DË VË LK D +pKo- 
teilg) + (a) 


(7.8) 


regsion for Nets Using (7.6) weget by (7.2) after modification 
expr j 


9 a DY Ya) pt Dha 
Nje = IZ) (a—A,)+-Ka ( Ke ) (a—Ag) + p(t 0) 


Now, we can derive the 
(7.7) 
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Expression (7.4) for K, can be simplified by introducing the characteristic value Aj, 
Ag of the basic matrix. We can easily derive that 


— bNag— Nj D'—d) 
(D’—A,)(D'—Aj) 


J (7.8) 


The “constants” K, K, can be determined from the boundary conditions of the chain. 
In general, they as well as A, Ag, (see (4.4)), depend on p. 


Let us compute the constants Ky, Ko e.g. in the case that Ñ, and Ñ, are given 
at the beginning of the chain, provided No and No, are equal to 0, i.e. in the moment 
t = 0 no microorganisms were present in the whole chain, We have by (7.6) and 
(7.7) 

Njo— K,+K, 


bN o = Ky(a—A,)+-K(a—A,) 


so that Kë I y Dadada] e (7.9) 
KT 


K,= T [—Â, o(a —A) +Ñ] < (7.10) 
Ad: 


Substituting these expressions into (7.6) and (7.7), 


where we set, according to our 
assumption, initial values equal to zero, we get 


Samy aje (2) eng (2 

Sa ETI ve (7.1) 
e {uO [( E 

+a |(a—2,)( =A Jaan) ( A 7) (7.12) 


It can be easily checked, that for k — 0 
the chain are satisfied. 


Remark: The results (7.11) and (7.12) can be modified using the substitution 
D' = Ce-B— = Qe 
Ay 2 

Clearly es TË D' 


DT 


I mr E 
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di 
Kë Fig. 4 


Fig. 3 


Fig. 7 


Fig. 6 
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and for C to be real we must have ad > be. Equations (7.11) and (7.12) express the 
relation between Laplace-Wagner transforms of the expressions for expected values 
Na (2) and Ne, z(t) at the end and at the arbitrary place of the chain under zero initial 
conditions. They can be also utilized for computing the stable state, when the 
expected values are independent oftime. Then, it suffices to set everywhere p—0 
and instead of A,, Ag use Ajo Ago, 


where Ayo = lim Ay Ago = lim À, (see (4.4) 
»—0 190 


The fact that (D’)}* is contained in all the terms, proves its significance. IfA,o > Ago 


1 VE , 
2 ) can be omitted with respect to the terms (=) and 
1,0 


then for k > 1 the terms ( 
Ago 


the geometric increase of Ñ, 4 and Ne, xk along the chain is very clear. 


The behaviour of the chain at the moment ¢ = 0 can be also easily derived 
-from equations (7.11) and (7.12). It suffices to pass to limits in (7.11) and (7.12) 


according to Abel for p—co. If N 1,0 and Noo are constants, then, in the first approxi- 


mation N. 1,4 and No, x increase in time similarly as the function t'. That means, the 
greater is the distance from the beginning of the chain (i.e. the greater k), the greater 
the delay. This results from the fact, that microorganisms stay for some time in 
every basin before they proceed further. 


8. CONOLUSIONS 


In the present paper some important questions of continuous cultivation of 
microorganisms (e.g. algae) are dealt with. This stochastic process is studied as a 
linear multiparametric branching process birth, aging, migration, and death. The 
purpose of the paper is to compute expected values and variances of the number of 
particular types of microorganisms in cultivation basins with continual circulation of 
the medium which is supplied with nourishing substances and from which some part 
of microorganisms is taken off as the yield. The conditions in the whole of the basin are 
assumed to be the same. In the paper the formulae both for the stationary and for 
the non-stationary state as well as the stability condition are given. A special atten- 
tion is paid to the case, when the system consists of several equal basins, i.e. when the 
system represents a chain. Besides general formulae, which enable the computation 
of the model, several numerical examples are given. 

It follows from calculations, that in the steady state there is no essential 
qualitative difference between the cultivation in a single basin and in a system of 
mutually coupled basins. But the transient phenomena are different. 
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të : 
HEORY OF AN EXPERIMENT ON THE REPLICATION OF DNA 
By P. A. P. MORAN 


Australian National University, Canberra 


SUMMARY. Thee i iti i 
ne U M MARY. The change in composition of bacterial DNA in a bacterium is calculated when the 
m is transferred to a medium containing a different isotope of nitrogen end certain assumptior 
ons 


mado al icati 
tbout the mode of replication of the chromosome, The results can be compared with experimental 


measurements, 

qi ae and Stahl (1958) determined the change in composition of bacterial 
5 owing a change in medium from one containing all its nitrogen as Në to one 

containing only Në. They: showed that the DNA first became a hybrid, containing 

equal amounts of N" and Ni, and then became a mixture of hybrid and pure Nu 

DNA. These experiments therefore confirmed the predictions of Watson and Crick 

(1953) that DNA replicates “semiconservatively”, each strand of the double helix 


being associated with a strand of new material. 
The units of DNA studied in these experiments each comprised about one 
hundredth of the bacterial chromosome, and inside the bacterium are probably 


joined end-to-end. Since few, if any, pure N units appeared until most of the units 
are not replicated at random. Their beha- 


were Ni Nis hybrids it is clear that they 
viour appears to fit a scheme in which the whole chromosome is replicated by one process 


starting at one end and moving progressively to the other. 
all refer to the NEN, NEN, and NUN" units as 


and “unlabelled” and we assume that the relative 
proportions of the lengths of chromosomes 


a For convenience we sh 
fully labelled”, “half-labelled”, 
proportions of these units are the relative 


which are NGN5, NUN, and NUN, 
ris to calculate these proportions on one theory 
The length of chromosome is taken 


oubles in unit time. 


se of the present pape 
ne replicates. 


that the population d 
asist of two strands joined along their 


a replicate being immediately formed 
om the medium. 


, The purpo 
of the manner in which the chromoso! 
as unity and the time scale is chosen 80 
The chromosomes are assumed to cor 
length which are torn apart starting from one end, 
at the point of division, this replicate containing ni 
From the above conventions about the units of time 


point of division along the arm is a 
When the process of tearing is comp 


Somes, each double-stranded, where there was only “be ves 
k Y x at 
that the two ends of each newly hromosome join toge 


formed ¢ 
somal ring, and that this ring is then broken at a point which is randomly and uni- 
së na E Se : 
formly distributed about the circumference. division then imme- 


The process of 
diately starts again at one of the two ends. 
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and length, the velocity of the 


lso unity. 
Jete there will be two complete chromo- 
one before. It is then assumed 


er forming a chromo- 
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Let n be the number of chromosomes in all the cells at time ¿= 0, and nit) 
the number at time t. Then 
n(t) = neht w TË) 
where P = In2 = 0.6931. Let N(t) be the total length of all the 
somes so that if the length of tearing of 
of the arms in this chromosome will be 
and if N is its initial value 


arms of the chromo- 
an individual chromosome is x, the total length 


l+. Në) will bea constant multiple of nt) 


N(t) = Neb, 

We must first determine the probability distribution of x. 
has a continuous probability density between 0 and 1 with 
bability at any point. Let this probability density be Je). 
somes at time 7’ with a”s between x 


(2) 

It is clear that this 

no concentrations of pro- 

The number of chromo- 
and «-+dx will be (0 Ku < l—da) 

n(7') f(x)de. (3) 


ney at a time THAT where AT is small but dæ 


Consider the corresponding freque 
much smaller, and we also assume œ < 1—AT. 


Then ‘this frequency is 
UT PAT f(ojdr, 
and is also equal to n(Z') f(e—AT dx. 
Thus 
HaË—AT) = ftojetar, 


and letting AT 0 we conclude that f(x) is proportional to 
ge, 

so that f(1) = 4 /(0) which checks with the fact that there are 

many chromosomes beginning to spl 

the integral of f(x) over (0, 1) 


obviously just twice as 


it as ones which are nearly completely split. Since 


must be unity we have 
Je) = 2pe-së 
= 21-4 (n2. vee (4) 
Let A(t), B(t), C(t) be the tot 


al length of those parts of the arms of chromosomes 
which are respectively fully labelled, half-labelled, and unlabelled. We have 


NH) = A+ BY +01) 
and at t= 0, 
A(0) = N, B(0) — C(0) = 0. 


(5) 
(6) 


with time, beginning with the, 
random will be split to 


We calculate the ch 
interval 0 SULI. a length x 
as originally formed at 
elled parts. At time t, 
i ce as much half-labelled 
l e chromosome vras formed at time I—y > 0 
and therefore then consisted of ions, one of length t—r of half-labell 
ed material. Since the chr 
a random point, the 
erial is being split wit 


troyed and replaced by twi 


ed material, 
omosome formed 
actual point of division at 
h probability 1—tty 


and then broke at 
time tis such that fully labelled mat 


, and 
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i 


E “Dë 


Consider its a. If w> ¢—1 the chromosome w 
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half labelled material with probability t—x. Furthermore, at time £ the total number 
of chromosomes is ne. Putting these facts together we find that 


AM) = nel { | ftejdet Pot fde}, Ae 

B'(t) = —2A'(t), së tË 

C'(t) = ne! Í tx) fla)de. (9) 
0 


Considering (7) first and inserting (4) we get 
A'(t) = —nebt 4 2ntebt + 2np4—2npre™ . 
Integrating this and using the fact that 4(0) = N we get 
A(t) = np-2—c8 4-2} + nBf4+-2(Bt—1)e®'— 208), (0KËK IV). ... (10) 
At t= 1 this is 
A(1) = nf-(4—e8) +p {4+ 2UB—2)0%}, me (LT) 


which can be simplified on using cë = 2. 


Similarly by integrating (8) and (9) and using the fact that B(0) = C(0) = 0, 
we find “ 
Bt) = InpHoht—1— 2} + 4np-{(2—ftje!—2}, (12) 
Bl) = —Inf(3-+e)+8np*(e* —1), (13) 
C(t) = 2n f-t- 2np-(2-++(Bt—2)e"), (14) 
oa) = omp-H(ebt-4 1) +-4np(1—e*). (15) 
These check with the fact that we must have 
i A(t) + BO+OW) = ndë. s = (6) 
Next consider the interval 1 K t < 2. We again choose a chromosome at random and 


as originally formed in the period 
at the time t—* < 1. At that time it must therefore 
belled material and a length 1—¢--x of fully 
ts at a randomly chosen point and therefore 


abelled with probability 1—tj- and half 


(0 < ¿ < 1) and it was completed 
have consisted of a length t—x of half-la 
labelled material. After forming a ring it spli 
at time ¢ the part which is dividing is fully 1 
labelled with probability t—t. 3 a, 
On the other hand if v < t—l the ae was ce pë n sh 
time ta > 1. It therefore consists of a ae ale ae = mi të = 
and an unlabelled portion of length dal y be fe eta Noa cs a 
probabilities of the portions being torn at time t. Putting these facts tog 


therefore get 


A(t) =ne" { I (a—t+a)fledeh, (17) 

Bi(l) = Zneb' { i (ta) fete} (18) 
: i1 

s (19) 


JET ome 
oy = ne [J o (ajde I ffe \. 
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Integrating these equations and using the initial values (11), (13) and (15) 
we get for 1 <t < 2 


BnAlt) = 8—88- —4t-+ebt {2—t+28-}, a (20) 
BnB) = —144-168-1-+-81—2e6' (2-14 28-4, a 4) 
Pnl) = 6—8h7A—4t-Leb* (3—t- 290. a (22) 


After t — 2, A(t) = 0, and Bi) remains constant. The course of A(t), Bd), 
and C(t) during the interval 0 Kë < 2 is given in the following table : 


TABLE 

të Br A Br B pra 
0 1.000 0 0 

0.2 0.857 0.286 0.005 
0.4 0.689 0.622 0.009 
0.6 0.525 0.950 0.041 
0.8 0.369 1.262 0.110 
1.0 E 0.229 1.542 0.229 
1.2 0.120 1.760 0.416 
1.4 0.060 1.880 0,700 
1.6 0.020 1.960 1.052 
1.8 0.002 1.996 1.484 
2.0 0 2.000 2.000 


It is easy to check that fn {A(t)+B(t)+-C(t)} = eft, From measurements of the 
ultraviolet absorption bands of the DNA such as are given in Meselson and Stahl’s 
paper it would be possible to compare the above results with experiments, and also the 
predictions of alternative theories. Thus we might suppose that as soon as a new 
chromosome is completed and separated from its twin it begins dividing at once starting 
from one end without first forming a circle. 
(a) it begins to divide from the same end 
divide from the opposite end to the 
end at random. 


In such a case we might suppose that, 
as in the previous division; (b) it begins to 
previous division; (c) it begins to divide from either 
The results for each of these cases will differ from each other 
from (20), (21), and (22), and can be easily worked out by the same kind of 

il 


» and 
argument, 
am indebted to Dr. H. J. Cairns for informing me of this problem. 
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ON FINITENESS OF THE PROCESS OF CLUSTERING* 


By JERZY NEYMAN 
Statistical Laboratory, University of California, Berkeley 


SUMMARY shale: 2 
RF S i M MERA . The study of the process of clustering leads to the consideration of the sum, sa; 

d andom variables, where the number of components is random and unbounded. Asa result if 
In addition to three sasy theorems on the subject, the papor Hives 


Min besën H 
lay bo infinite with probability one. 
N to infinity occurs in somewhat unexpected conditions. 


two ex E 
a examplos where the divorgonco of 


1. [INTRODUCTION 
1955), that the most general homogeneous first order 


Tt is known (Neyman, 
the following mechanism, supposed to operate 


PROQESE of clustering can be generated by 
in the Euclidean space Z of arbitrary 
A (i) Particles C, deseribed as cluster cen 
in H. Let A represent the expected number of cluster centers in 


E of measure unity. 
(ii) With every cluster center C there is associated a positive integer valued 


cribed as the number of members of the cluster centered at 
r centers are mutually independent and 


the probability generating function 


dimensionality. 
ters, are Poisson-wise distributed 
any Borel subset of 


random variable v, des 

C. Numbers v attached to different cluste 

have the same distribution characterized by 
G,(t) = E(t"), where |t] <1. 

(iii) Given the position of a cluster center C and given that the correspond- 

uster centered 


i = i ri 
ngv=n>1,n particles ™, Tn: deserib i 
ace with t 


at O, are distributed in spac 5 
0 and XO 


U (Uy, Up, ...) represent the coordinates of 
, Ug, ...) represen E : 
nates of the i-th member of the cluster. Tt is postulated that XO is a random variable 


with probability density, to be denoted by Fla—u) and called the “structure” of the 
cluster, depen ain g alr apan t the cluster center C and the point 


he distance between i 
T= (tpt ) at which it is evaluated. The random variables XY for += 1,22; 
an së sancti independent. Also, they are independent of similar variables 
Corresponding to other cluster centers. 
. tm there is performed a random 
i i ed at some point # there is performe' 
iv) For each particle 7 locate ee 
trial n i? E “sneceSS “failure. A(x) represents the probability 
al capable of yielding * suce iaa 


s” or a 
i function of v. 
it ig a Borel measurable 
assume that it 18 8 a 
riab 
at of that of an i 


y other particle and also of all other 


ed as members of the cl 
he following postulates. Let 
= (X{, XP, ...) the coordi- 


Tgp iss 
e Kin accordar 


of success, and we 
of one particle is independet 


in the system. 


Let R stand for 
The subject of our study 


possibly an infinite, measure. 


em random variable N representing the number of those 
is the TA i san s . : 
particl in R that are «guecessful.” In particular, mM Section 2 we study the condi- 
articles at are pë E auth a is hen 
tin smin + na e Lin which case we shall say that N is degenerate. 
ns under which PLY = j he degeneracy of N that appear to have 


| set in Æ having & positive; 


xamples of t' 


Sections $ and 4 are give? to e 
Grant GP—10. 


— nexpected features. 
— “Proparo wi or va ional Seu nce ‘oundation, 
vopared with the t: t of th Nation 1 Scie! F 
h the par jal supP' t 
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2. CONDITIONS FOR N TO BE NONDEGENERATE 


The probability generating function of N is given (Neyman and Scott, 1952) 
by the formula 
Gy(t) = Pe ve. (2.1) 


where the integral in the exponent extends over the whole space # and where 


plu) = I f(e@—u)O(a)da : (2.2) 


represents the probability that a particle 7 belonging to a cluster centered at u will 
be in R and that it will be “successful.” 

The integral in (2.1) is never negative. If it is convergent, then (2.1) is equal 
to unity at t = 1 and N is finite with probability one. On the other hand, if for a 
fixed value of t between limits 0 < ¢ < 1 the same integral diverges to infinity, then 
P{N = k} = 0 for all finite k, and N is degenerate. It follows that to study the 
conditions under which N is degenerate means to study the conditions under which 
the integral in the right hand side of (2.1) is divergent. 

Theorem 1: In order that N be nondegenerate, it is necessary that 

p= { pudu < +o. se (2:8) 


The proof is based on the following simple inequality. For any nonnegative w <i 


1—G,(w) = 1— $ w” Ply = n} > 1—P{y = —w(l—P{v = 0}), ae (2.4) 
n=0 


or 1—G,(w) > (1—Pfy = 0})(1—w). ni (2.5) 
It follows that the integrand in (2.1) 
1—G,[1—(1—t)p(u)] > (1— Pv = 0})(1—#)p(u) > 0. ve (2.6) 


Thus, if the integral in (2.1) converges then 7 must be finite. 
Theorem 2: If the number v of particles, in a cluster has a fimite expectation 
vj = Av) = G1), aa (BET) 
then the convergence of I is a sufficient condition for nondegeneracy of N. 
Theorem 2 is due to Mr. A. H. Marcus. For any positive w < 1 we have 
1— G (0) = (l—w)G(w*), sa (28) 
where w* is a number between w and unity. However, the derivativ 
generating function is nondecreasing. It follows that 
1—G(w) < (1—w)&;(1) = (1—to)v, w (2.9) 
and, in particular 1—G,[1—(1—t)p(u)] < v(1—t)p(u). w. (2.10) 


e ofa probability 


Thus, if Z is finite then the integral in (2.1) is convergent and N is nondegenerate. 


Theorem 3: If the expectation yı and the measure of R are both finite, then 
N is nondegenerate. 
In order to prove this theorem it is sufficient to show that the hypotheses 
imply the finiteness of I. Using (2.3) we have 
I =f du T fle—w)O(e)de, wa (2.11) 
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and, changing the order of integrati i 

egration and remembering that fis ili i 

the integral of which over Æ is equal to unity, ae eas pane 
I = O(a)dx < m(R). ne (PT2) 


Naturally, depending upon the nature of the probability O(x), the int 
I may be finite even if the region R has an infinite measure. For Siti this ue 
case when the clusters considered are those of galaxies and “success” ae waite E 
apparent brightness for a galaxy to be included in a catalogue. Here R may sie 
the whole three dimensional Euclidean space, with infinite measure. E 
A(x) tends to zero with increasing distance fast enough for the integral (2.12) to converge. 

The above circumstances may suggest that when J is finite, the supplementary 
assumption of Theorem 2, that the expectation vj is also finite, does not play any role 
and that the finiteness of Z by itself may guarantee the nondegeneracy of N. The 


example in the next section shows that this presumption is false. 


DED REGION R with A DEGENERATE N 


Let Æ be a straight line and R be the interval [—1, +1]. Also assume 
Olx) = 1. Thus = 2. We shall define a probability generating function G,(t) and the 
structure of clusters f(v—u) such that the number N of particles in R will be infinite 
with probability one. Because of Theorem 3, this can be possible only if the expec- 


tation of v is infinite. 


For n = 2, 3,... let 
Pfy = nj = 1/n(n—1). Se (6210) 


the corresponding probability generating function is 


3. EXAMPLE OF A BOUN 


Tt is easy to see that 
G,(t) = t+(1—#) log (It) for O< <1. (3.2) 
Also, it is obvious that t—1 implies G,(t)>1. Further, 
= (1—#)[1—log(1—#)] ou) —— OP) log p(w). (3.3) 


1—@,[1—(1—t) p(w 
g equal to 2, irrespective of the cluster structure f, 


The integral over Æ of p(w) bein, 
ondegeneracy of N depends on whether the integral 


J = fletu) 10g polu (8.4) 


the degeneracy or n 
The answer to the latter question depends upon 
alues of Jul. Obviously, p(w) is symmetrical with 
Therefore, it will be sufficient to consider only 
> eso that logu> 1. Our purpose 


diverges or converges, respectively. 


the behaviour of p(w) for large V 
respect to the origin of coordinates. 
values of u > a and it will be conveni 
is to find p(w) such that 


ent to set a 


i plujdu < KO: while — f p(w) log plujdu = +. (3.5) 
An example of such a function p(u) is provided by 
C 
POY = Togë oe (3.6 
plu) u logju (3.6) 
where C is a constant to be used in defining f. However, the value of this constant 
nce of the integrals in (3.5). 


does not affect the converge! 
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Easy calculations show that, for any a < b, 


ò du 1 1 


= E vë 3.7) 
J ulogu loga log b ( 


Thus, if p(u) is given by (3.6) then the first of the integrals in (3.6) has a finite value 
Clloga. Similar calculations give 


ù b 
J p(u)|log p(u)|du = C f su llog C—log u—2 log log u| w (3:8) 
a a u log?u 
and it is seen that the convergence of the second integral in (3.5) depends on that of 
b z b 
eg pa log logu du — f- d log log b— log log a. =» (3:9) 
à u log?u a ulogu 


However, as bc, the last expression tends to infinity, and it follows that, with pro- 
bability equal to one, the number N of particles in [—1, +1] is infinite. 


At this point the question may arise whether formula (3.6) is consistent with 
the meaning of p(w) defined by (2.2) with 0(e) = 1. In other words, it is appropriate 
to ask whether a probability density f can be defined so that, at least for |u| > a, 


+1 
J f(x—u)jdx = piu), i (3.10) 
“i 


where p(w) is given by (3.6). The answer is in the affirmative. In fact, for œ between 
‘limits |x| < a—1 we may take any even nonnegative integrable function (x) and 
set f(x) = C,g(x), where O, is an adjustable constant. For «> a—l the definition 
of f must conform with the properties of p(w). We notice that for u > a, the function 
p(w) is decreasing. Thus its derivative is negative. The definition of the corresponding 
f(x) is made separately for each interval 

a+nm—1l <a <at2n+1 I ce BI 
with n—0,1,2,.... Namely, for each n and |E] <1, 


flaten) =— 3 platëntm— 1-të) i (18) 


and it is easy to check that the series on the right is convergent and positive. Also, 
it is easy to verify that (3.12) satisfies (3.10) for anyu >a. Since fis an even function, 
its definition is now complete. This definition depends on two constants C and Cy 
which can be adjusted easily so that the integral of f over H is equal to unity. 

To appreciate the result obtained fully, consider its “operational” interpreta- 
tion as follows. Let # denote a straight wire extending to infinity in both directions. 
Some moths are laying batches of eggs on this wire distributing them Poisson-wise 
with an arbitrarily law density, say one batch of eggs per mile, on the 
number of eggs varies from one hatch to the next, namely one-half of 
tain two eggs, one-sixth of them three eggs, ete. 
batch will contain exactly n eggs is 1/n(n—1). 


average. The 
all batches con- 
Generally, the probability that a 


The eggs hatch, and the resulting larvae 
begin to crawl along the wire. They do so independently from each other. The 


probability that, after a certain time, a larva starting from a point w on E will be found 
at some close distance is more or less arbitrary, However, when we come to. large 
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distances, the probabili 
: rje the probability of the larva being found in a one-foot interval centered at 
së ner rët Pa 8 8 £ re a 
së + s e given by (3.6) with |a—w| replacing u. The question is about the prob = 
nat the number N of larvae in a one-foot I i ae 
ok ae ength of the wire E, namely i 
oot-length centered at the origin of coordinates, will exceed one mi a ve we 
i ' pordu M million. Unexpec- 
er, the probability in question is equal to unity. The ae 
result stems from the well-established habi Ni 
mistines” inënitorrosi E abitat appro 
nating” infinite regions of space by finite but large sections. Thus, we are ke P 
ire may well stand for an infinite wire E. Yet, if in the above 
d by a finite wire extending from —A to +A 
taken between the same limits and will be con- 


random variable WV will be nondegenerate. 
be asked whether, with the distribution of v defined by 


s where N is nondegenerate. The answer 
x—w) vanishes 


tedly to the present writ 
of surprise concerning this 


consider that a long w 
example the infinite wire E is replace 
then the integral (3.4) will have to be 
vergent so that the 

The question may 
(3.2) and with an infinite Z, there 


is in the affirmative. For example, 
whenever |w—w| exceeds a certain limit, then the integrand in (2.1) differs from zero 


only over a set of points of finite measure and must converge no matter what the nature 
of G,(t) is. The general conclusion is that, if E(v) =+%, the nondegeneracy of V 
depends on the inter-relationship between the speed of decrease of Pjy = nj as n>% 
and the speed of convergence to zero of p(w) as |u| 00. Here a general theorem 
establishing the necessary and sufficient conditions for the nondegeneracy of N would 


be interesting. 


are case 
if the structure of the cluster ri 


RACY OF THE NUMBER OF IMAGES OF GALAXIES ON 


A PHOTOGRAPHIC PLATE 

nal Euclidean space. 
buted in Æ with cluster 
of galaxies per cluster is finite. 
all directions Poisson-wise, their average 
the same for all galaxies. The 
dent and the energy of all the 


4, NonDEGENE 


We treat the galaxies 
ing of first order. 


Let Æ stand for the three dimensio 
Also we 


as dimensionless luminous particles distri 
We assume that the expected number Vj 


assume that each galaxy emits photons in 
number per unit time and per unit solid angle being 1”, 
lar galaxies are mutually indepen 
independent of the distances of galaxies. 
n the sky is taken with a telescope. The area of 

are directed towards a 


emissions by particu 
photons is the same, 

A photograph of a region Ri 
the telescope’s mirror is A. The photons hitting the mirror 
point on the photographic plate depending upon the position of the galaxy emitting 
them. The exposure time is 7’ units. Our basic assumption is that, in order that the 
image of a galaxy be visible on the photographic plate it is necessary and sufficient 
that the number of photons from this galaxy reaching the mirror of the telescope 
during the exposure time T' be at least equal to 4 fixed number s. pi N denote the 
number of images of galaxies on the photographic plate. The problem is to find under 
what conditions, if any: the random variable N is degenerate. 

Tt will be realized that the above assumptions regarding galaxies, about 
their luminosity properties, ete. are gross over-simplifications. However, this cir- 

from the interest of the example considered. 


cumstance does not detract 
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It will be realized that the distribution of N is determined by formula (2.1) 
provided one defines appropriately the meaning of the “success” of any particular 
galaxy and calculates the probability O(x) of this success, given that the coordinates 
of the galaxy in space are x = (1), to, wy). 

Let x be a point within the solid angle R over which the photograph of the 
sky is taken. We assume that the origin of coordinates is at the observer and denote 
by £ the distance of the point 2, 

E = faj, +agtaz}. we (41) 

Obviously, in order that a galaxy located at x be “successful” in having its 
image recorded on the photographie plate, it is necessary and sufficient that during 
time T this galaxy emits at least s photons in the direction of the telescope's mirror. 
To a sufficient degree of approximation, the number of photons so emitted during time 
T is a Poisson variable, say Y, with expectation 

KY) = wT Ale = pjes, say. vës (42) 

Thus, the probability of “success” of the galaxy is 


l w/g? 


Gx) = P{Y > s|a} = e-ul $ MJE — f eedi w (i3) 
0 


nes N! (s—1)! 
According to ‘Theorems 1 and 2, in order that N be nondegenerate, it is nece- 
ssary and sufficient that the integral 


x Në pa TË de HE se 
I= J p(ujdu = J ede =f em i bet di ve (4.4) 


be convergent. R being a solid angle with its vertex at the origin of coordinates 
. . . -F 
the transformation to polar coordinates easily leads to the formula 


æ He 
I=Cfe pa e-t E 
TË if edi, vee (4.5) 


where C is a constant depending upon s and R. Now, it is easy to verify that the 
integral (4.5) converges when s > 2 and diverges when s = 1, irrespective of the value 
of s "TA > 0. Thus, we come to the following paradoxical conclusion: the finite- 
ness of the number of images of galaxies on the photograph of the sky de 
upon the sensitivity of the emulsion, but not on the size 
the length of exposure. If the emulsion of the photo: 
as to be able to record a single photon, then 


pends only 
of the telescope and not on 
graphic plate is so sensitive 
> With probability oni E 
Fee e, the number of 
images of galaxies in the photograph will be infinite even if this hte sraph is tal 
with a tiny telescope and with a very brief exposure. On the other h oy it n I pe 
| ! ; r ha if at leas 
two photons reaching the photographic plate are necessary to produ gj i 
ss 'oduce 
then, no matter how large the telescope and no matter how long th 
. ‘ 
be, the number of images of galaxies wi ini i i 
; ages ol galaxies will be finite, with probability or 
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EFFECTIVE ENTROPY RATE AND TRANSMISSION OF 
INFORMATION THROUGH CHANNELS WITH 
ADDITIVE RANDOM NOISE 


By K. R. PARTHASARATHY 
Indian Statistical Institute E 
SUMMARY. Transmission of information through channels with additive noise is considered. 


Coding theorem and its converse are established for these channels with a certain notion of capacity. 


This capacity is explicitly computed for this class of channels. 
1. INTRODUCTION 


The famous MeMillan’s theorem regarding ergodic sources can be reformu- 
lated as follows. Consider the minimum number of n-length sequences which have a 
total probability exceeding 1—6. If is the measure describing the source denote 
this minimum by N, (6 4). MeMillan’s theorem states that for every ergodic source 


“ the limit lim log Nu (6, 74) exists and is always equal to the entropy rate of the 


no 
source as defined by Shannon. The question arises as to what happens to the sequence 


log N, (e, 7) as n—oo when the source is not necessarily ergodic but stationary. We 
n 
show that except for a countable number of e's the limit always exists and in general 
depends on e. We construct two functions A(e) and Ble), (0 < e < 1) which coin- 
Hi — 6 log Nn(ë, 1). 
and both lim and lim of Pe lie between A(e) 


cide except on a countable set 
the functions A(e) and Ble) converge to a unique 


and Ble). Further, as 6-0, both 
deseription of the functional H(z) is also given. 


he notion of a channel with additive noise. 


Here the input and output alphabets coincide pit a finite abelian ah s a the 
noise is distributed according to an arbitrary stationary measure enë e af i pe, 
A’. When a message sequence is sent through the channel the noise ge she ne 
the message independently of the message. The disturbed i oe në eine $a 
output. The binary symmetric channel is a ype example. For the a a A with 
additive noise distributed according to & stationary measure jë, we Dn er M,(e, t), 

Il possible codes with probability of error less than or 


the s f the length of a 3 i i 
së sh ‘ansmission of messages during the time period 1, 2, ee la 
r al to e (for tr e of error less than or equal to e is defined in the sense of 
w pes a që er “i thst we analyse the asymptotic behaviour of the sequence 
olfowitz (1961). j 
log Male: H) We show that 
n 


limit H(z). The precise 
Tn the last section we introduce t 


the limit of this sequence exists for all e except on a 


We also show that the lim and lim of this sequence lie. between 
e also $ aes : 

Ble) where A(6) and B(e) are the functions mentioned in the 
p—s TE ‘ 


1 a is the number of elements in the alphabet A. As ¢-~» 4 


countable set. 
log a— A(e) and log ¢ 


previous paragraph and 
kë 
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log a—A(e) and log a—B(e) converge to the same limit log a—H(u). Thus in this 
case the capacity of the channel is not described by a single number but by the two 
functions A(e) and B(e). 


1 Nj 23 
The idea of studying the asymptotic properties of the sequence as L 1) 


is due to Winkelbauer. He gave the description of the function 


Hy) = lim īm PEN) 


e—>0 n—>0O n 


in terms of the entropies of the ergodic components of u. It was stated by him 
without proof in his lecture at the Indian Statistical Institute. 


2. PRELIMINARIES 


Throughout this paper A will denote a finite alphabet, A’ the space of sequences 
of elements from A, T the shift transformation in A’ and 4 a measure which is defined 
on the usual o-field of A’ and invariant under T. We denote by [s, xe, ..., %] the 
cylinder set in A’ of all sequences whose i-th coordinate is æ; for i = 1, 2,-4.7. Any 
n-length sequence 2, wp, ..., Vp is referred to as a u-sequence. We denote by N,(e, pe) 
the smallest number of u-sequences whose total probability is greater than or equal 
to 1—e. This smallest set may not be unique. We choose one of them arbitrarily 
and denote it by A,(e, pe). 


TË we assign the discrete topology to A and the product topology to A’ then 
A’ becomes a compact metric space. We shall now follow the notation of Oxtoby 
(1952). Tf f(p) is a real valued function on A’, let 


MIJ, p, t) = fdp) = È f(D) =1,%...) 
and Mf, p) = f*(p) = lim Mp, b) 


in case this limit exists. A Borel subset Z of A’ is said to have invariant measure one 
if H(E) = 1 for every invariant probability measure x. Let Q be the set of points p 
for which M(f, p) exists for every fe C(A’) where O(A’) is the space of continuous 
functions on A’. It follows easily from Riesz’s representation theorem that corres- 


ponding to any point peQ there exists a unique invariant probability measure It 
such that E 


MY, p) = J fd py. 
Let R C Q be the set of those points for which lnis ergodic. Ris called the set of regular 
points. Then we have the following representation theorem of Kryloff and Bogo- 
liouboff which can be found in Oxtoby (1952). 


Theorem 2.1: The set R of regular points is a Borel measurable set of invariant 
measure one. For any Borel set EC A’, Ng(E) is Borel measurable on R and 


ME) = J E) dlp) 
Jor any invariant probability measure p. 
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Let H(p) denote the entropy of any invariant probability measure pe. 


Let 
(2.1) 


H(u) = ess sup H(t) 
(2.2) 


Hip) = ess inf Hy) 


and the essential infimum are taken relative to and 


where the essential supremum 
ding to the regular point p. 


ly denotes the ergodic measure correspon 


3, ASYMPTOTIC PROPERTIES OF THE FUNCTION N, (6, 4) 


In this section we shall prove the following theorem and two corollaries. 
Let [A', p] be an arbitrary stationary source, 


Theorem 3.1: 
“E Ate) < tin 08 Hale u ES NGA < Be e (8:1) 
where A(e) = lim 7(0), (3.2) 
} ’ 
Ble) = lim 7 (0), (3.3) 
ates 
(8) is the greatest number with the property 
pp : Hy) > 112 ô 
and q'(6) is the smallest number with the alia 
pip : Hu) < 11> ò. 
log N, (€, 4 £ 
Corollary 3.1: For any stationary source LA', 4]; lim osc A e, Ht) exists 
for all 0 < e< 1 except for a countable set. 
Corollary 3.2 : For any stationary source LA”, ni, 
lim lim log Ny (6.1) = — lim Jim, log N, (6) — (e, 1) — H(1). 
1 = 
e—>0 n= n i 
f£ Theorem 3.1 we need to establish two 


Before proceeding to the proof o 


lemmas. 
For any Stationary source TA”, 10), the limit 


Lemma 3.1: 
in. == + log play to) = ge) 
no 
g.(D) — H(p) a.e- p(y): 
MeMillan’s theorem. In 


exists in measure and 


Proof: The existe’ 


the limit is the famous 
nee of (1957) it can be seen that g,(%) 


McMillan’s theorem a 
ditional probability of xy given 


the course of Khinchin’s proof of 
can be obtained as follows. Define h,(e) a8 the Që ian 
sne under y (Herë it is assumed t that 2 = Get Po 
> Vas os. T fle 
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Then 


gjike) = lim Le KHAT phy (Pan) ae, (u). 
ayo n 


But by Theorem 2.6 of the author (1961), 


we have h(a) 


for almost all ply). 


hy p) Heu (Tx) +... thy (T= ze) 
Further lim 5 = Hu) a.c, (Mp) 
n> n 


for any regular point p. 


Thus Ial) = Hu) ae, (ftp) 
for almost all p(n). From the Kryloff- 


Bogoliouboff theory of regul 
dynamical system we have 


ar points in a 


lliz = lp ae. (yp). 


Thus Ial) = Hu) ae. (Hp) 


for almost all Pu). Thus the set H= 


le : gale) AH(u 
/ for almost all Ph). 


+)] has measure zero under 
An application of Theorem 2,1 


shows that 
MB) = J (Bap) = 0, 
This completes the proof of Lemma 3.1. 


Lemma 3.2: For any stationary source [A’, uj and any ex 0 


im log Ne, 1) lim log N, (e t) = 
Hu) < lim 108 Nile, x lim 28 “Ale, pi 
Hy) < a 5 = ae i S Hu). 


Proof: From Lemma 3.1 it is clear that 


Hy) < qitë) < Ain) 


with probability one. Thus if we write, for any fixed 4 > 0 


o. I po 
ma: — a log u(x... Ta) > H()—q) = 1— 


then 630 ag no. The com set written within braces in (3.4) 
has probability 6,. Any set with pr ili 


I 


ji 


EFFECTIVE ENT 8 
LIVE ENTROPY RATE AND TRANSMISSION OF INFORMATION 


bability 

i N greater than 1—s—A, whose ele: i i ity withi es > 
Fe 4 e—, ments satisfy the inequality within bi 
GAS. Swppose (his subset has NT wsequcwecs: Por hee si. Sa Eh Të 

Nk GE 


Ja Di vend YLLIN, 4 


i E 
Summing up over this subset. we get 


yetip < No, 
Thus 1—e—d,, < Nle, pata», 


= 
Since 4,50 as n= after some stage 


log N,(6; 1) — log (1—e)2 +A) 
“= ie ee n z 


Thus 
n 


which implies lim log Nle 9 > HU). 


Since 7 is arbitrary we have 


lim Jog Në tl > Hu). 


In order to prove the other inequality, consider, for any fixed 7 > 0, the ' 

Sequence of numbers 

me : L log alt, + ta) < H+ = 1—8n. (3:5) 
n 


i '=0. 
By Lemma 3.1 and (2.1) we bave o dn 
of u-sequences satistying the inequality within braces 
If the inequality 


Thus there exists a subset 4 
thane: aiie E 1—e for all sufficiently large n. 


in (3.5) whose probability exceeds 


Within braces in (3.5) is satisfied, then 
gë gi) , (3.6) 
“In 


les» 
EN u sequ tistying (3 6) are required to make up a probability greater than 
-sequences satistyIng \- 


l—e tl 
en 
p o qA) , 
A 


— log Niés 1) < F+: 
Thus lim DPanc < HU) 
implies the validity of the lemma. 


The E i 
he arbitrariness of 4 
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Jn(v) = lim MHA (Ta) t.. the) a.e, (91). 
In n= n 


But by Theorem 2.6 of the author (1961), 


we have h(x) = ha, (x) ae. 2x({t,) 


for almost all p(x). 


hu p(t) thu (Te)... Hu, (7 æ) 
Further lim 


no n 


= H(t) a.e. (Mp) 


for any regular point p. 
Thus gje) = H(p) ae. lta) 


for almost all p(x). From the Kryloff-Bogoliouboff theory of regular points in a 
dynamical system we have 


ltz = fly a.e. U/ty). 


Thus Jule) = H(p) ae. (pp) 


for almost all p(t). Thus the set E = [X : 9,(t)AH(,)] has measure zero under 
Hy for almost all p(y). An application of Theorem 2.1 shows that 


ME) = J ,(B)du(p) = 0. 


This completes the proof of Lemma 3.1. 


Lemma 3.2: For any stationary source [A’, 4] and any ex 0 


m < lim 
na n nen n 


E) < lim NE Nae < Tm OENE) Ç 


Proof: From Lemma 3.1 it is clear that 


Hin) < gule) < Hu) 


with probability one. Thus if we write, for any fixed n> 0 


ee! , 
na: n EKA... ty) > A(u)—7] = 1-6, ni (8.4) 
then 6,50 as ñy 


90. The complement of 
has probability 4,. 


the set written within braces in (3.4) 
Any set with probabilit 


Y > l—e must have a subset with pro- 
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babilit 
bility gr 
: y greater than 1—e— 
nee I i e—d, whose el i i 
(3-4). pë f ii- ements satisfy the inequality -within braces 
subset has N’ u-sequences. For these s 
sequences 
Summing ttn tao tae ee 
g up over this subset, we get 
3 e 


1—e—d < N EORR 

r 
Thus 

S 
Si 1—ë—ò, < Nil, mes ED 
Since 6 

0 as n=% after some stage 
1—€ 


1—e—d, > a 


log N,(6, 1) > log (1—6)/?  H(0)—n 
e 5 E 


‘Thus 
n 2 
which implies lim log Ny(ës 4) > HU). 
lim a ee ae 
Since 7 is arbitrary we have 
ini Ee - 


Sequ In order to prove the other inequality: consider, for any fixed 7 > 0, the 
{uence of numbers 
ne : -4 log lër = n) < H+ = 1—dn- TRGE) 
n 
By 
Lemma 3.1 and (2.1) we bave lim di = 0 
noe 
nces satisfying the inequality within braces 
If the inequality 


y-seque 
all sufficiently large n. 


Thus 

Is p ; 
in ta “Shume exists a subset An of 2 
cir whose probability exceeds 1—€ for 
in braces in (3.5) is satisfied, then 

u A m (88) 


iii e 
ity greater than 


are required to make up 4 probabil 


le a 
E (3.0) 


les aiiai satisfying 
N,(& 1) <N < gwi 
Tik m Ndë 10) < AA 
he validity 9 
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We shall now turn to the proof of Theorem 3.1. Choose any >e. Choose 
the largest 7 such that 


Mp: Hu > 1] > 6. 
Let it be 7(6). 


Të E = [p : He) > 7(6)] 
then WE) >. 

mim KEQE) 
Define SË RD 


We assume that n(2”) > 0. 


Then L = Oty +(l—a) Uy 
where a = ui) > 8 >e. 
If we consider the set A,(¢, e) (the smallest set of u-sequences with probability 
> 1-e) 8 
l—e—(1— — ô— 

then mate, n) > id) > -z >=, 
Thus Nle, K) > N, (t > ta) 

a 
and oe fae oy 


An application of Lemma 3.1 shows that 


lim log Ne, 1) 


ho > HË) > 18). 


TË pÇE”) = 0 this inequality is triviall 


y valid. Since 6 is any number > ¢ and (6) 
increases to A(c) as 6 descends to e 


im 18 Nate, x) 
we have lin ESM S Aq. (3.7) 
For proving the other inequality choose $ < € and then choose the smallest 
9’ with the property 


Hp? H(t) < 1] > 1-8. 
Let it be 4'(8). 


Let F= [p i H(u,) < n'(6)], 

and mB) = BE) py, BOE" 
“e MR MB - are 

Then 


KS buy +(1—b) ni, 
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where b = u(F ae : “l—e 
ME) > 1—6. If mB) > then p(B) > 1—e. 


Thus 7 
AUS N,(e, 2) < Ni ( 1— =, Hi) 
MEGI = Ki 


Thus by Å — log Ne, jë la ' 
yioma? ia ke Nae M) < F) <7) 


As 6 increases to €, 4'(ô) decreases to a limit B(e). 
Thus — log N,(6, #) 

lim — ae < Bee). sae (3:8) 
heorem 3.1. Corollary 3.1 is an immediate conse- 
I line there cannot be more than a countable 
and hence A(e) = Ble) except for a countable 
from the fact that A(e) and Ble) converge to 


(3.7) and (3.8) complete the proof of T 
quence of the fact that in the rea 


collection of disjoint open intervals 
set. Corollary 3.2 follows immediately 


H(p) as e> 0. 
Theorem 3.1 and Corollary 3.2 it is clear that the number 


Remark: From 
opy rate of the stationary 


H(n) defined by (2.1) can be rightly called the effective entr 


source LA”, uj. The result that 
lim lim 
e50 NJË 
Tt was stated by him without proof in one of his 
Tt was his conjecture that 


— log N,(é, 4) — F 
caer” wa H(u) 


is due to K. Winkelbauer. 
lectures at the Indian Statistic 
lim log Nalë: 4) exists for every €. 
n 


NO 


al Institute. 


4. CHANNELS WITH ADDITIVE NOISE 


ce the notion of a stati 
the coding theorem 
annel coincide with a finite abelian 
an abelian group. We denote by 
For any set Æ we 


ionary channel with additive 


In this section we introdu 
as well as its converse. 


noise, define its capacity and prove 
d output alphabets of 
ay the space A’ becomes 
ad inverse operation in the group 4’. 


Let the input an a cl 
group A. Ina natural w: 
+ and — the addition al 


write 
aa = [2:4 e A', 2+% € El. 
Let jig be an invariant measure defined on A’. Then the probability distributions 
vilë) MF.) (4.1) 
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where F is any set in the usual o-field of A’ and «is any point in 4’ define a stationary 
channel. The distribution at the output of this channel corresponding to any input 
distribution 4 is obtained by convoluting A with x. Even if / is ergodic this channel 
need not be of finite memory in the sense of Feinstein. If the group A consists 
of two elements 0 and 1 only, the addition is done modulo 2 and pe is the product 
measure obtained by assuming probability p for one, then we get the so-called binary 
symmetric channel. We shall call a channel whose distributions are specified by (4.1) 
as a channel with additive noise and noise distribution pe. 


A code with error ¢ and length N is a collection of u-sequences Uy, Mg, vey Uy 
and sets Vj, Va, ..., Vy of u-sequences with the properties 


(i) “(Vi—u,) > 1-e 
(ii) Vi V;=¢ fori Fj. we (4.2) 


Let M,(e, jt) be the maximal length possible for 


a code with error ¢ for a channel with 
additive noise and noise distribution y. T 


hen we have the following theorem. 


Theorem 4.1: Fora stationary channel with additive noise and noise distri- 
bution pu 
log, a—B(e) < lim log M, (e, n) < lim log M, (e, u) < log, a— A (e) 
— - n 


where A(e) and B(e) are the functions occurring in the statement of Theorem 3.1. 


Corollary 4.1: Except for a countable set of es the limit 


im log M, (e, 
hue EK 8 = log, 4— Ae) = logy a—Be) 


exists. Further 
li a mb | Yr, 
lim log a Ale) = lim log a— Be) = log a—H(u). 
Remark: : Corollary 


4.1 justifies our c 
the additive channel with n 


oise distribution 4 

Proof of Theorem 4 Le 
For any set E of u-sequences let 
perty (i) of (4.2) we have 


alling log a— H(n) as the capacity of 


Let u V Vy be a code wi h p 
s 1 Up, U, Va, =u y be a code with error €. 
4, 

m(E) be the number of U-sequences in F. By pro- 


m(V;) > N, (e, 1). 


(4.3) 
For any ô > 0 and all sufficiently large n 


we have by 
MV) > 2ate—aj. 


8 are disjoint and the total number of U-sequences jg gji 
esisa 


Theorem 3.1 and (4.3) 
Since Vy 


» We obtain 
N 
a > UV) EN 2HAÇ)—s), 
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Thus log | 
g N 
w Fe (  — 

ee aaa ni (4.4) 


Sine i 
e (4.4) is true for any code of error €, we get 


log M, (e, 
log M, (6, #) < log a—A(e)--ë. 


n 


Allowi infini 

ing n to tend to infinity and then noting the arbitrariness of ô, we get 
— log M, 
lim log Ma (6,1) < log a—A(e). 


For eat ; : : 
Eee. ee. ing She other inequality in the theorem we follow Takano (1957 
We fa e an arbitrary number less than e. Choose the set V, to be the set es 
allest number of u-sequences whose probability exceeds 1—e’. In the nota 
(e, pe). Let u, be the u-sequence all of whose 
such that 


tio iven i i 
n given in Section 1, Vj) = An 
of the group A. Now choose ty 


elements coincide with the identity 
(Va ee) Vi— w] > 1—-€ 


where vi : 
the prime is used to denote the complement. Tf no such a exists stop. Write 


V, = (Vitta) Vi. 


Choose ug such that 
uV +u) VeVi] > 1—6. 
is chosen such that 


Tf s 
no such uş exists stop. At the r-th stage U, 


ur Vj-tu,) Vial ra vee 
V, = (Vat te) Pra Pra Vi. 
UV Then we have: U 


Vi—u,] > 1-6 


i 
T ; 

hen we write 
-sequences 


L 
et the process terminate after N stages. Let V = 


., Vy of u-sequences with the properties 


U, Uy, ... uy and sets Vy, Ve 
a) Viu) > 1% 
(2) VG Vit 
(3) NT 
(4) V= U = U (Vit%)s 
For any u-sequence uv: 
we have from property ( 


(5) +-u)V'—u) < 1—6. 
ability greater than 
(V) = KAHU 

< AA UA 
< Vue): 
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This inequality can be rewritten as shë 
WV —u) > ecë. Kë f 


iplyi si 5 —" and 
Since (4.5) is true for every u, we obtain by multiplying both sides of (4.5) by a™” ar 
adding over all the z-sequences 


m(V)a-" > e—e’ 8 
or m(F) > a". (e—e’). va (4.8) 
Since v= Ö V; and V,G Vitu, 
1 
We 
we have mV) < N.m(V,). was (ALE) 


We recall that V,=A,(e’, po). By an application of Theorem 3.1 we have, for any 
ô> 0 and all sufficiently large n 


MY.) K QnlBle’) to, we (4.8) 
Combining (4.6), (4.7) and (4.8) 


N 5 (e—e') + al. Q-n Be’) +4), 


Since M,(e, u) > N we have 


log Hal 4) > mE (ee) -[log a—B(e')|—6. 


Allowing n—co and then 30, we get 


lim log M,(¢, ) > log a— B(e'). 
no Tt 


From the definition of the function Bee) 


we see that it is left continuous. Since é’ 
is any number less than £, we get by letting e” increase to e 


lim log M,(e, L) 
mn n 


> log a—B(e). 


Corollary 4.1 is an immediate consequence of Corollaries 3.1 and 3.2. 
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A STABILITY THEOREM FOR THE BINOMIAL LAW* 
By B. RAMACHANDRAN 


Institute of Mathematical Sciences, Madras 


shown that if the sum of two independent (one-dimensional) random variables 


SUMMARY, Itis 
dividual summands also, the degree of closeness of 


is “approximately” Binomial, the same is true of the in 
being measured by the distance function 


AF, @) = sup |F(x)—@(@)| 
z 


two random variables 


between their respective distribution functions F and G. Analogous results are already known for the 


Normal and the Poisson laws. 
1. INTRODUCTION 

Let X, and X be independent (real-valued) random variables, and let 
X = X +X 

Tn the theory of decomposition 
X has a Normal (respectively & Poisson 
is true of the summands X, and X, also. 
pectively to P. Lévy -H. Cramér, D. A. Raikov, and 
found, for instance, in Lukacs (1960), Chapter 8. 

Sapogov (1951 and 1959) established a ‘stability theorem’ for the Normal 
as a distribution which is sufficiently close to a Normal distri- 
X, and X, also, the degree of closeness between two 


the ‘Kolmogorov distance’ 


of random variables, it is well known that if 
, a Binomial) distribution, then the same 
Details on these results which are due res- 
N. A. Sapogov-H. Teicher can be 


law, namely, that if X h 

bution, then the same is true of 

distribution functions being measured by 

dr, G) = sup | F(x) —G(n) |. 
x 

by Shalaevsky (1959). It 


e Poisson law was established 
the Binomial law. 


An analogous result for th 
a similar result for 


is the purpose of this paper to establish 


2, SOME NOTATIONS 


We shall use the following notations. 


For all real v, 
“e A = > n pT 
Bee; m p) = si) 


igh all integer-values, where we define (for convenience of notation) : 


(2) =0 for r<0 and r>n. 


k r running throu 


vely denote the distribution functions of 
ill be assumed to be right-continuous). 


and Industrial Research, Government of India, 


F(x), File) and F,(x) will rëspecti 
istributi ions w) 

X, X, and X> (All distribution funct 

* Research supported by the Council of Scientific 

«Scientific Pool” scheme. l 


under their, 
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3. STATEMENT AND PROOF OF THE THEOREM 


Theorem: Let ex 0 be a sufficiently small number (the degree of smallness 
required will be made clear in course of the proof), and let 


sup | F(t)—B(x; n, p)| < e. won (3:1) 
=z 
Then there exist integers My, na, and a positive constant k independent of e, 
such that VKM, m Sn; myn = n va (3,2) 
sup | F4(x)—B(a—e; ni, p)| < keuen, ve (8.8) 
z 
sup | Pou)— Bate; No, p)| < kern i (3.4) 
where c = sup {a| P(X, SË vë) Kë Ve). so (8.5) 


Proof: We shall assume without 
we need only consider the random vari 
respectively.) 


loss of generality that e = 0. (Otherwise, 
ables X,—c and X,-Le instead of X, and X, 


Then we have P(X, < 0) 


Again, for any ô> 0, P(X, < ô) > ve. 
Hence, P(X, K 0) > Ve (3.7) 
From (3.1), we have in particular that 
P(X <0) <e. 
Hence Ve P(X, < 0) < P(X, SHE PË <0) E e 
so that PIX, < 0) < ve. (3.8) 
Now, for j, 


k=1,2; JË k, the event (X; > 0, 


X, > 0) impli ent 
(X> 0). Hence k ) implies the ev 


(Z < OCZ <U, 0 
so that P(X <0) < PIX, diia an, 


3 < 0+P(X, < 0)= P(X 
for j = 1, 2, 


SOQHP SO 
Hence, from (3.1) 1 )+ (X, < 0)4+-P(X; ) 


» (3.6) and (3.8), we have for JESHË 


J'—e < P(X SKI) 2VE |-P(X, — 90) 


vvhich implies that P(X; = 0) > sq 


l G=1,2) (2:9) 
€ being assumed to be so small that 
Pree < iq”, (3.10) 
Hence, for Jk=1,2 
, = 3 455 Jak, 
sh fi 
Ye nave aq PX, <0) < P(X, = 0) PUX,e 0) < PX < 0) <¢ 
so that 
so tha P(X, <0< 2eq-" Us 2). (3 11) 
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Remark: However, the estimates (3.11) for P(X; < 0) are more of interest 
than of importance to us : the estimates (3.6) and (3.8) are quite adequate for our 
purposes as will be seen during the derivation of relation (3.25). 

Now, from (3.1), it follows that for all real æ, 

| F(x—0)—-Biv—0; n, p)| = lim | F(y)—Bly; n, p)| Ke 
vSa—-0 


Hence, for all integers r. 


Pr X< r+1)—( “eri 
ILP(r--1—0)— F(r—O))—LB(7- 1-0; n,p)—B(r—o0: n, p)I) 


< | P(r +1—0)--Bir-+1—0; n, p)| +| F(r—0)—Blr—0; n, p)| < 2e (3.12) 
By a similar reasoning, 
P(r < X < r+1) < 2. . (8.13) 
Hence, for j, k = 1, 2; j # k, and for all integers r, by (3.9), 
Ag P(r < X < r41) < P(X; = 0) P(r < X; < r+) 
K Pj E X SHI) < 2% 
whence Pir < X; <r+1) K4e” (j= 1,2). s (3.14) 
We Te define 
a = inf {x| P(X, > x) < 2V leg”). (3.15) 
We will assume ¢ to be so small that a > ¢(= 0 by our assumption). 
From the definition of a, A 
P(X, > a) K 2Veq" ve (8.16) 
and P(X, > 022 ea m for alla<a 
whence P(X; > a) > Veg”. vow (8.07) 
We note that, since P(X > n) < e from (3.1), and P(X, = 0) > 49” from (3.9), 
P(X, >n) < P(X > n)[P(X_ = 0) < 2e q” < 2Veg” 
so that n>a (3.18) 
Let = faj, and m= n— } PETT 
so that 0 < M4; ne SM ntg =n 


(As usual, faj denotes the largest integer not greater than w.) 
We note the further facts below : 
Play EMK my+1) < 4e9”: 


by (3.14), 

and, since a < mtl, We have by (3.16), i 
P(X, > m+!) < P(X > a) < 2ye q”. 

ki PUG, > m) < over e (8.20) 
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Also, since m Ka, 


P(X, > m) > P(X, > a) > 2n/eq-” from (3.17), 
whence P(X >n) < P(X > n)/P(X, > m) < Veq”. ni. (3.21) 
From (3.20) and (3.21), it follows that(c,, cy, ..., denoting positive constants not depen- 


ding on e) for j = 1, 2 
P(X; > nj) < & Ve. ve (3.22) 
We now define the new random variables (for j — 1, 2) 
0 if X,<0 or >y 
Y;= r if r< Xy <1-+1, for ogr<n 
n if X =n. 
Then, the independence of X, and X, implies that of Y, and Yç. Let Y = Y,HY,. 
We estimate the quantity 
D, = P(r < X < r+1)—P(r < Y <r+1) 


=Pr < X <r + 1)—-P(Y =+) we (3.23) 
where r is an arbitrary integer. 
Let P, — Pr K X arik XAY) 
and P= P(Y =r; X £F). 
It is clear that D, = P,—P;; and P, P; < P(X £Y) 
and hence that |D,| < P,+P, K 2P(X # Y). vee (3,24) 


We therefore proceed to estimate P(X + Y ): since the event (X, = Y,; 
X, = Yg) implies the event (X = Y), it follows that 


(X Æ Y)C(X, E Y)U (KX + ¥,) 
so that P(X AY) < P(X, # Yy)+P(X, £ Ya) 


2 nj—l 
= 2 IPA < 0)+P(X; > n)+ Z P(r < X; < r+1)] 
je To! 
2 E 
E 2 Per teve teg". m), 
using (3.11), (3.22) and (3.14) respectively, 


< cave. vs (3.25) 
Hence, from (3.24) and (3.25), 


IDA < 2eve. i (3.26) 


We then have from (3.12) and (3.26) that for all integers r (recalling the notation 
in Section 2) 


PY == (") rë < ave ve (8.27) 
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) be the probability-generating functions of the 


Let now qz), g(2) and go(2 
They are polynomials of degrees 7, të) and no 


variables Y, Y, and F, respectively. 
respectively, and (for all complex 2) 


gle) « gale) = ole) = (ao) t È eZ 
r=0 


where, by (3.27), lej < ve: (3.29) 
We present the argument henceforward in rather condensed form, to avoid 
trivial and at the same time cumbersome details. 

Let these three polynomials in z transform respectively into the polynomials 
fw), fiw) and faw), under the transformation w = z+(q/p). By a straight forward 


heorem on the zeros of analytic functions (see, for instance 
: ; 


(if e is sufficiently small) all the zeros 


o that, if e be so small that 


application of Rouche's t 
Titehmarsh (1939, p. 116)), 
of f(w) = g(z) lie in the circle poj < [evel 


leal (< tave) < 4p", then all the zeros of f 
Of these zeros, nj belong to filo) and the remaining na to 
y simple and obvious steps) that 


it is easily seen that 
ENPA En)" S 
(w) lie in |w] < (2ejvVe)”jp = ee”. 
fon). Hence it follows 


(omitting a fev 
nı mi i 
az (q+?) pie 


ny 

ple) = tp) + Pe 
| is less than ¢,é/". Hence it follows that 

(G(x) Bl; nj, p)| < vjetit 
n of Y; US 1, 2). 
due to Sapogov (1951). 
the distribution functions of the random 
then for all real së 


where every |æ] and |% 
(3.30) 


je distribution functio 


where G(x) is ti 
mma 


We now invoke 4 simple le 
Lemma: If Hi) and H,{x) are 
variables Z, and Z, (not necessarily independent), 
|H (2) Hale | < P(A, F Zo) 


> 2)D (Za > ® Zi—Zą > 0) 80 that the event 


Proof: The event (Zi 


(Z, < Ca £ US 
andl jës, PL; Ka) < P( Zs < v)--P(Z— Ze 0), 
that is, H,(a)—Hdë) < P(Z,—22 <0)< P(Z, F Zo). 
ained from it by interchanging the subseripts, 


| relation obt 
t of the lemma. 


and of the 


m,: . 
This relation, and the dua 


together yield the statemen 
argument leading to relation (3.25), 


In view of the above lemma 
we immediately have, forj = 1, 2; ; 
P(X;# Y) < tay € (3.31) 


I Fe) -E | < 
89 


12 


SANKHYA : THE INDIAN JOURNAL OF STATISTICS: Serius A 


Relations (3.30) and (3.31). yield the assertion of our theorem for the case 
c = 0, where c is defined by (3.5). As we have already pointed out, it is sufficient to 
consider this case, since the general case can be reduced to this case by considering 
the random variables X,—c and X,-+c instead of X, and X, respectively and applying 
the above argument to these new variables. 
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THE LIMITING DISTRIBUTION OF THE VIRTUAL WAITING 
TIME AND THE QUEUE SIZE FOR A SINGLE-SERVER 
QUEUE WITH RECURRENT INPUT AND 
GENERAL SERVICE TIMES* 

By LAJOS TAKÁCS 


Columbia University, New York 


ueing process with recurrent input and general service times 


SUMMARY. A single-server que 
a of the virtual waiting time and that of the queue size are 


is considered and the limiting distribution 
determined. 
1. INTRODUCTION 

customers arrive at a counter at times 
The customers are served by a single server in order of arrival. The 
o customer in the system. Denote by xy) the 
d that the service times Xn m = 23-5) 
...) are independent sequences 
variables with 


Suppose that in the time interval (0, 00) 
Tis To, veg Tyee 
server is idle if and only if there is n 
service time of the n-th customer. Tt is suppose 
and the interarrival times Oy = Tori 
of identically distributed, mutually 


—T, n= 1, 2, 
independent, positive random 


distribution functions 
Pixa KYT He) (1.1) 
and PLO, < a} = F(x). (1.2) 
this paper a and f are supposed to be 


p- Throughout 
= 0,} = Lis excluded. 


at time t, Le., 
Let 7, = NTa — 0); 40. Dn 
Denote by E(t) the queue size 
aiting or being served) in the 
eue size immediately before 


Let Efy,) = x and E0,— 
finite and the trivial case P{Xn 

Denote by (t) the virtual waiting time 
customer would have to wait if he arrived at time f. 
actual waiting time of the n-th arriving customer. 


at time f, i.e., the total number of customers (either w 
j.e., En 38 the qu 


> pa at time ¢. Let E= ÉlTa— 0): 1 


| the arrival of the n-th customer: 
In what follows we shall determin 
We note here that the distribu 


y(t) is the time that a 
is the 


tion of #(t) and that of 


e the limiting distribu 
ze is independent of 


tion of the queue si 


E(t) as t00. 
the order of service. priefl 
tion briely 


At this point I should like to mentio: 
One can suppose without loss of 
n advance at his arrival, 


lly independent random vari 


the idea which leads to the notion 
generality, that each customer 
because the service times are 
ables and independent of 
has a clock mechanism 


of virtual waiting time. 
is assigned his service time i 
identically distributed, mutua! ka at ee 
k i i , that we use 2 TË echan 
e i we set the hand forward by his future service time. 
arrives arë i 
si m, it will at an; 
Si : Jong as there are customers in the system, ieee any given 
ince this clock runs as long * | tual waiting time. ‘Thus an arriving customer can 
ate virtue a me arriving customer 
itë = ch under Contract Number Nonr— 


instant show the ap ropri ËT 
‘ appro —— qne Office of Naval Resear 


and each time a customer 


“arn A rae ore! 
*This research was spons 


266 (59). 
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immediately see his own actual waiting time on this clock. 7(t) can also be inter- 
preted as the occupation time of the server at time f, that is, the time that is needed 
to complete the service of all those customers who arrived before t. In certain 
queues 7(t) has a real physical meaning. For instance, if we consider reading of mes- 
sages in a telegraph office, then y(t) can be interpreted as the length of all messages 
which remain to be read at time t. 

The process {7(¢)} has interest not only in the theory of queues but also in the 
investigation of operation of dams. If 7(t) denotes the content of a dam at time 1, 
then {7(¢)} has the same stochastic behaviour as the virtual waiting time in a queueing 
process (cf. Gani and Prabhu, 1959). 


Finally, we introduce the following Laplace-Stieltjes transforms : 


vi(s) = pes dH(x) ne (18) 


o 
and p(s) = | e dF (x) w (1.4) 
0 
which are convergent if §R(s) > 0. 
2. THE LIMITING DISTRIBUTION OF THE ACTUAL WAITING TIME 


The following results have been proved by Lindley (1952): If a Sp, then 
the limiting distribution lim Py, < x} = W(x) exists, independent of the initial 
n—>»>n 


state and it is the unique solution of the following integral equation of Wiener-Hopf 
type 


(Ff K@w—yawy) if x >0, 
W(x) — 4 è (2.1) 
Lo if x <0 in Se 
where K(x) = c H(x-+-y) dF(y) su (222) 
0 


and further W(0) > 0. If > £ (the trivial case P(X, = Oy} = 1 is excluded), tin 


PI oe n = ©} = 1, whence it follows that lim Piy, 
A T s rae VS 
irrespective of the initial state, 


<ax}=0 for every w 


Define the event & such that & is said to occur at the n-th arrival if the server 
is found to be idle at that time. Evidently, @ is a recurrent event. Ifa gf, then 
b E, 


€ is persistent, and ife > P, then @ is transient. (As to the theory of recurrent events 
we refer to Feller (1957), pp. 278-320.) i 
Denote by R(x) the probability that the dist 


PR ofeiga Ifa KP, then R(x) = i i, R(x) is a proper distribution 
function. The mean recurrence time of @ is 


ance between two successive 


p= J edhte) = J D= Ræ)de = p/W(0). (2.3) 
Ifa < f, then p < co, whereas if = B, then p= co. Ifa s P, then R(00) < 1. 
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Finally, w i i i 
ally, we note that if F(x) is not a lattice distribution function, then R(x) 


is not one either. 


3. THE LIMITING DISTRIBUTION OF THE VIRTUAL WAITING TIME 


We shall prove the following theorem. 


Theor : $ T RA 
heorem 1: Ifa < £ and F(x) is not a lattice distribution function, then the 


limiting distribution 
li yz We 
im PO <2} = TË Ce 
exists, independent of the initial state and is given by 
W(x = (I Ž Wix (e 
here W @)=( nit VËSH) ED 
where W(x) is defined by (2.1), 
Iz i 
14 p—-Holdy if 20 
HY(a= 4 7% * aa 
(ea if «<0, 


1 case P{Xn = On} = 128 excluded), then 


and x denotes convolution. If a > P (the trivia 
the initial state. 


Be Piytt) K x} = 0 for every ¥, irrespective of 
£ consists of two parts. First we prove that the limit exists 


ing distribution in case of a KA. We 


Proof: The proo 
form of the limiti 


and then we find the explicit 
need the following lemma. 

Lemmal: Let A bean event which has the following property : If A occurs at 
lime u and does not occur at time ut, t Lies that at least one customer pie 
in the interval (u, u--ti. Denote by Pa the system is in state A 
at time t. Ifa EP and F(a) is not a lattice dis = PY 
nt of the initial state. 
the expecte 
enote the prob 
the interval (0, t]. 
pability that A occurs 


hen this imp! 

(t) the probability that 

tribution function, then lim P(t) 
co 


d number of occurrences of & in the 
ability that the system is in state 
Measuring time from an occur- 
at time t and @ never occurs 


exists and is independe 

Proof : Denote by M(t) 
time interval (0, t]. Let Qi 4 
A at time ¢ and & never occurs in 
rence of @ denote by Qa) the pro 


during the interval (0, tl Evidently 
(3.4) 


O 
PA = UOH I Qiu) dU): 
vent and consequently, 


ariation in 
then by a theorem 


lim QY) = lim 0O = 0 
=}. 00 t0 à 


a persistent e 
every finite interval 


If a < p, then & is 
f bounded Y 


bution function, 


ËS 
2 
E 


=] 


Dl 


mi+h—MO — 
i] 


lim = 
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where p is defined by (2.3). Thus if z < P and F(x) is not a lattice distribution func- 
tion, then it follows from (3.4) and (3.5) that 


lim P,(t) = T Q,(u) du nn (8.8) 
ta Po 


irrespective of the initial state (cf. Smith (1958) and Takács (1962), pp. 227-228). 
Since 
Qalu) < 1—R(u) ve (3.7) 


for u > 0, the integral on the right hand side of (3.6) converges. 


It remains to prove that Q(u) is of bounded variation in any finite interval 
[0, tj. The proof is based on an idea of Smith (1955). Measure time from an occur- 
rence of. Denote by v(t) the number of arrivals in the interval (0, tł]. Define val 
if A occurs at time £ and & does not occur in the interval (0, t]; x, E 0 otherwise. For 
OK U KË we have 


IQAO—Qate)1 < Efy.—x,}-+2B(v(t)—v(u)}. ne (3.8) 
For, Q4(t)—Q4(w) — Ety.—x,3 = Pu — 0, M= 1}—P{x, =], x= 0}, 
whence 1Qa()—Q.alu)| < Efy,—x,}-+2P{x, = 1, x, = 0} 


and by assumption 
Pity Lus 0} < Plot) —v(u) > 1} < Blott)—vta)). 
Accordingly for any subdivision 0 — fo < ti <... < ta = t of the interval Jo, i] 
n 
JA Uh) ealt) < Ex Xo} + 2Efv(t)} < 1424 {v(1)}. vee (3.9) 
Since E{v(t)} is finite for every t > 0 (E{v(t)} < Ct+-1 where C 


it follows that Q(t) is of bounded variation in [0, t). 
Lemma 1. 


is a positive constant), 
This completes the proof of 


Remark 1: Let a < P and F(z) be a non-lattice distribution function, 


Consider a monotone non-decreasing s 
J a sing sequence of events ACA C...CAC... for 


which A, satisfies the assumptions of Lemma land lim 4,=Q, the sure event 
ky a E 


a mE ‘ m ‘ - ‘ 
Let pëR E 4) = Pa, defined by (3.6). Then pm Fa =1. For, in this case 
Q By (u) (k = 1, 2,...) is a monotone non-decreasing sequence and lim Qa (u) = 

, k "B 
1—R(u), whence by Beppo Levi’s theorem (cf. Riesz and Sz-Nagy (1959) p. 34) 
lim d Q4,(u)du = J [1—R(u)]Jdu = p. 


kn 


The statement follows from (3.6). 


Now we shall prove that if g SP and F(x) 
tion, then the limit (3.1) exists and is independent of the initial state. Define A as 
the event that the virtual waiting time is < x, where v > 0. Then the event A 
satisfies the assumptions of Lemma 1 and P at) = Pin) <2}. Thus by Lemma 1 


is not a lattice distribution func- 
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e limit (3.1 ci f i 
Ps (3.1) exists and W*(x) is a monotone non-decreasi i 
maki Poja ecreasing function of v and by 
If «>, then li f 
A im P{y(t) < x} = 0 for every v i ; 
state. Since y(t) > t— co Se or every x irrespective of the initial 
Pi pë n i Z ln for Tn-1 LUL M— EEA and by Lindley’: 
n— 00 Ja = CO} = 1 for a >f, we can conclude also that in this oe Pa 
H(t) = co} = l, R “ae 
Remark 2: Now w i 
2: r we shall prove directly that P{ li 
m Seals 
{ lim a(t) = 00} = 1 if 


o> DB. JË in ti t 
f Denote by v(t) the number o: arrivals in he interval (0, By a theor f 
ja D rem o: 


Doob (1948) we have 


PÍ ii NGE = 
{ Jim, ; zi TË ve (8.10) 
ia N (0) 
Since obviously në > 7(0)-+ È xt, 
i=l 
we he w(t) . 90 DA <2) 


If . d ë “e 
t—oo in (3.11), then we have with probability one that (0)/t0, v(t)/t1/2 and 


latter follows from an easy extension of the strong law of large numbers. Thus 
y (3.11) 
im wf M2 SË-120 
co TË 
with probability one, whence 
P{ lim 7(t) = oo} =1. (3.12) 
t0 
This proves that if x > J, then lim Pat) <95? for every w irrespective of the 
mo 


initial state. 


Finally, lim poli) K aj = Ve fa< B and F(x) 


it remains only to find 
i 
a random variable O(t) as the 


First we define 
Then we observe that the vector variables 


ate is given by (9(0), 0(0)) where (0) 
is the time of the first arrival. 
Markov process in itself.) 


ution function. 
al after t. 
The initial sti 
and 0(0) = 71 
then fyt) is & 
<x and Ot) < y where x > 0 and 
and if « < A and F(x) is not a 


iclude that 


48 not a lattice distrib 
hy between £ and the first arriv 
w (t), O(t)} form a Markov process: 
is the initial occupation time of the server 
(We note that if the input is a Poisson process, 
Define now A as follows: A occurs at time t if tt) 
Y>0. This A satisfies the assumptions of Lemma 1 
lattice distribution function, then by Lemma 1 we can col 
im Pam OO) <B= UA 
to 


(3.18) 
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exists and is independent of the initial state. W#*(x, y) is a two dimensional distri- 
bution function, because by Remark 1 W*(c0,c0) = 1. Let 


OQA(s, vo, t) = Efe- 0- Ho, _ (3.14) 
and O*(s, w) = JÈ e-ze, W*(a, y). wa (GRAD) 
00 
Ifa < f and F(x) is not a lattice distribution function, then by (3.13) 
lim Q%(s, w, t) = Q*(s, w) ses. (8.16) 
to 
for R(s) > 0 and (w) > 0. 
If Q*(s) = Feat), a. (3.17) 
0 
then obviously Q*(s) = Q*(s, 0). 


Now we shall prove the following lemma. 


Lemma 2: Denote by m(t) the expected number of arrivals in the time interval 
(0, 4]. If m(t+At)—m(t) = O(At), 


O4(s, w, LAL) —O%(s, w, 
then . Grate x (s, w, t) (w+s)Q%(s, w, )—sP) D (w, 1) 
m(t-+-At)—m(t 
meto), yo (318) 
where PA) = Pint) = 0}, O (e, 0) = Be-mojoti) = 0) 
and Dw, t) = Bfe (9) = 0). 
y Proof: TEOQ) > At, then 6(t-LAt) = (1)—At and n(t+At) = max(0, y(4)— At). 
us 


Efe~slt+-At)—-woit-ary | At) > Ab} 
=P{n(t) = 0, Ot) > AGE te 2110-60 | 9 (4) = 0, A(t 
(L+-wAt)+P{0 < MË) < At, A(t) > At) 
Ee 0-6 (e) > At, A(t) > Ad[1-+(w+s)At]+o(At). 


) = 0—e,At and MEKA = 4(t)+-y—e,At where X 
I those customers who arrive in the interval (t, t--Atl, 


ween the last arrival in (t, t+ At] and the first arrival after 
< land 0 $ e < 1. Thus 


p t) < AY = VS) Efe—no-vito 
Since Pfo(t) SMS m(t-+-At)—m(t) = : let 
pectation that pë 
Q*(s, w, b--At) = BEN, 
Efe y 


) > 49 


< AQ+0(At). 
O(At) we obtain by the theorem of total ex- 


wH s)AOQ*(s, w, t)—sAt 


N= 0,00) > an Pin(t) = 0, 04) > Ad. 


U—yts)eto)) Ke | Ot) < Ab. 
Since padi, =P) < AB+-0(At) also holds ed ee one 
OS, w, sje. = LHW sAN s, w, t)—sAt Pi r 0) 
Efe" tt) = Fl —W(s)b(w)] Efe- | 94) E 0). 
which is in agreement with (3.18), [m(t-+At)—m(2)]-+ o( At), a (3.20) 
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By Lemma 1 the following limits exist pun Q*(s, w, t) = Q*(s, w) (cf.(3.16)], 
jim Pot) = Po (It is easy to prove directly that Pi = 1—a/f Lef. Takács (1962) 
p. 142) and nanaon t) = (w), say. By Lindley’s Near limi Q(s, t) = Q(s) 
Q(s) = i e dW(x) re (3-21) 


where 


and W(x) is defined by (2.1). By Blackwell’s theorem 


. m(t+At)—m(t) _ 1 Ae 
hi SS i 2122) 
tow At B 


If we let ¢—00 in (3.18) we obtain 


- Is )] 
(w+s)Q*(s, w) = sPõ@(w) + Davee Q(s). ky B) 


If 10-90 in (3.23), then we get 
e, L—YG)I 
Q*(s) = Pot Bs Q(s). 3 
Since Q*(0) = 1, we obtain that P; =1—a/f. Thus finally the Laplace-Stieltjes 
transform of W*(a) is given by 
a), æ [=W] o(s), (3.25) 
axa) = (1— aye ee Qe) 
This completes 
(i) Suppose that Fly) = 1—e Ne > 0) and H(«) is arbitrary. 
1 
<1, then 


f of Theorem 1. 
whence (3.2) follows by inversion. the proof o: 


Examples : 


In this case A = 1/A. Tf Aa 
1—Aa na (8.26) 


al) = ST 
jat 


(s) = Q(s), be» W(x) = FC). 


aN . In this 
) is arbitrary and H(x) = 1—6" (e > 0). In 


and thus by (3.25) Q* 
(ii) Suppose that F(x. 
— 1, then 
case a = l/u. TË nË > i d aoe mae 
o(s) = QË a —a)+8 
1—z)) inside the unit circle. Now if we suppose 


one then we obtain by (3.25) that 


where z = ô is the only root : tion 
1 ice distribution function, 
that F(x) is not a lattice 1 p(l—?) w+ (3.28) 


1 oA. 
o*(s) = (i= i nB w—a)+8 


-hi-ë2 az, (3.29) 


W(x) = 1-98 
1 e7 t00% if «> 0. 
W(x) =l- në 
97 


Fr 
rom (3.27) qe 


and from (3.28) 
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4. THE LIMITING DISTRIBUTION OF THE QUEUE SIZE 


The following two theorems are concerned with the Sat ol gare te 
the queue size. Formulas (4.2) and (4.10) were He ges by Ss ) ‘heuer aide 
munication made in January 1958 during a meeting on ae z ee sein 
tice” arranged by the Institute for Engineering Production r ss a s APN Tel 
mingham, England). Formula (4.10) has also been proved by : ae ieee 
some restrictive conditions the existence of the limiting je ete ic 
size for many-server queues has been investigated by Finch (1959). 


EOR z th 
Theorem 2: Ifa < P and F(x) is not a lattice distribution function, me 
on 8 i is independent o 
limiting distribution lim P{E(t) = k} = Pi (k = 0,1,...) exists and is independent. o 
ton 


the intitial queue size. We have 


and for k = e Pi 


PA 
k= — 


gi Te) Pe) Ay (He) a (4.2) 


where F(a) denotes the k-th iterated convolution of F(a) with itself; F oe) = Life > 
and F(x) = 0 if x < 0; W(x) is defined by (2.1) H¥(x) is defined by (8.3). ap, 
then lim PEt) = k} = 0(k= 0,1, va) irrespective of the initial queue size. 

tn 


Proof: TË we define A as the event that the queue size is K kk 0, 1...) 
then this event satis Accordingly if u £ P and F(x) 
» then the limiting distribution lim P{E(t) < k} 
>o 
exists and is independent of the initial state. Tf ZË, then lim Pret) < hy =0 
to 
(k= 0,1 


r++). This follows from (4.13) which will be proved later. 
Remark 3 : 


We shall give a dir 
d(t) the’ number of 


ect proof for the case q >. Denote by 
departures in the 


ime interval (0, tj and by v(t) the number x 
arrivals in (0, t). Then E(t) = &(0)+-v(t) a(n), 
whence “0 = 50) 1 LO r (4.3) 
t t t 


By Doob’s theorem lim v(t)/t = 1/8 and 
to 


li 
t ae Slt< 
(0)/é = 0 with prob 


l/æ with probability one. 
ability 1 we obtain from 


(4.3) that 


Since lim id 
ton 


limi f E yee 
sp ie Renae © 
with probability one, i.e., P{ lim E(t) = o= 1 4.4) 
t>% an se 
98 


a a (42) 
2 
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anh e V UE 


To find P} for k = 1,2,... we can write that 
na 


Prl) = kb = Qilt) +4 f = fa 
FE(2) k} Qi(t)4 it 1. -a(t—u)—F y(t wit fl -H(t—u—y)] 
F I _—u—y)]. 


d,P{n(u) < yA) = O}d mu), 
; m (45) 


where Qi(t) is 
k(t) 15 t robability 
ae sm E mae that the queue size is k at time £ and there i 
erval (0, #]. The second term on the right hand sid E eae 
a e of (4.5) ce 
The customer being served at time £ ih 4 i në 
0 <y KI—u) and in the interval (u, t] k—1 ae 7 
ë . . ` z 4 oF 
) is not a lattice distribution function, then 
By using (3.22) 


sar in the following way : 

< u < t), his waiting time is y ( 

= arrive. If «<f and F(x 

ir Pini j ) 

lim Porte) < y| Ou) = 0} = Why) where a) is defined by (2.1) 
we obtain from (4.5 i ie 
.5) that li = k} = Pi re Pi is gi 

) im 0) ky = Pë where Pj is given by (4.2) for 


k— 
1,2,.... TË k— 0, then Pi = 1—a/B, because 


lim P jo Vets g2 zg 
dim P(E) = QI PËS I DR 
Remark 4: TË W, = Trad 
1 T x dW(«) (4.7) 
is finite, then Oe 
> Y kPj — 5 SH 
i pe k B (Ma). (4.8) 
An intuitive proof is as follows: Obviously i 
Eg v(t) 
I E(u) du— Zi (mtx) 
is bounded with probability 1, i-e- 
lim 2 f eed im TË ted = GM 
m— — = R v) = — 
o re (ujdi = I int MX Bi ita) na (4.9) 
(i)/t1/P with probability 1, by the strong 


with probability one. For, by (3.10) ¥ 


law of large numbers 
Pa 1 v(t) 
Ti a 
joo v(t) i=1 
and by the ergodic theorem 
im LË me W 
Frnt IN 
eet) “shë x 
that {&(f)} is a stationary 


with probability 1, 


with probability 1. TË we suppose progas shen ey 
o 
pje) =È Të 


e expectation, of (4.9) we obtain (4.8). 


0, and if we form th 
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Theorem 3: If a KË, then lim P{t,—h} =P, (k= 0,1, ...) exists, 
In 


independent of the initial queue size and we have 
Py = | [Fy(e)—Pya(e)]d[ (0): Ha) na (4.10) 
0 


where W(x) is defined by (2.1). If >> p, then lim P(E, = k} = 0 for every k irrespec- 
NN 
tive of the initial queue size. 


Proof: The event č, +k E E occurs if and only if the n-th arriving customer 
departs before the n-+k--1-st customer arrives, i.e., if and only if the queue size im- 
mediately after the departure of the n-th arriving customer is Kk. Thus for arbitrary 
initial queue size £(0) we have 


PJ rn < oS i MA «x E 


where W,(x) == Piy, <a}, because the queue size immediately after the departure 
of the n-th arriving customer is equal to the number of arrivals during the waiting time 
and the service time of the n-th arriving customer. 


If « Kf, then lim W(x) = W(x) and by (4:11) 
n—>0O 


lim PE, < h} = f [1—Fyay(x)]d{ We) «H(2)] we (4,12) 
no ò 


which proves (4.10). 
I a >$, then lim W,(£) = 0 for every x and by (4.11) 
: n>% 


lim Pfë, < k} = 0 ni (4.13) 
n—co 
for every k. 
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THE CAPACITY OF AN INDECOMPOSABLE CHANNEL* 


By J. WOLFOWITZ 
Cornell University 


SUMMAR The strong verse O codini orem T pi channel 1s 

S Me F; g converse f the coding theor for an indeco; 

, EA : posable cl i 

proved, thus establishing the capacity of the channel. An approximating result is proved which mal 
roved kes 


the capacity computable. 


1. DESCRIPTION OF AN INDECOMPOSABLE CHANNEL 


dr chen” What tn ene aro vil De apparent 

a a annel. at these terms are will be apparent 
shortly.) To avoid the trivial both a and b should be greater than one. A sequence 
qa “letters,” each an element (integer) in A (respectively, in B) is called a trans- 
mitted (respectively, a received) n-sequence, also sometimes a word or an n-word. 
All words sent (i.c., transmitted) and received will have length n (i.e., consist of 7 
letters), where n is an arbitrary integer. 


annel is characterized by the way it transmits any 
id of a stochastic b Xb 


Jumn will be called 
striction which 


An indecomposable ch 
transmitted n-sequence, and this can be described with the a 
matrices D,,..., Dy. The element of D; in the i-th row and j-th co 
dk, i,j), k= 1, 0; i j 5 l, oo b, and the D’s are subject to a re 


will be described below. Let 
Uy = (ti sess Xn) 


be any transmitted n-sequence and 
T (Yo = Yn) 
ed over the channel the 


ansmitti 
ance 


When w is sent or tr 
hose components are ch 


be any received n-sequence. 
ance sequence (one W. 


n-sequence then received is a ch 


variables), say 
v(t) = (X(t) ++ Y,,(Uo)). 
l be complete when we have defined the probability 

This probability depends upon the “initial state” 
transmission of ty begins), and this state is an 


The description of the channel wil 
that v(2 9) = vo (for any Vo and vo): 
of the channel (i.e., the state when the 


element of B, say A. Then we define, for h = 1, -> b, 


n 
P{v(uo) = volh} = dë, h, J1) u dlm Ym-1? Ym): (1.1) 
as follows : At the beginning of the trans- 
ate h, either because this was the last letter received 
other reason. The first letter sent is ty 


ve that the probability of the channel’s 


The specification (1.1) can be thought of 


mission of u, the channel is in st 
ìn the previous transmission OT for some 


Employing the stochastic matrix Dz, we ha 
Pag fice of Scientific Research and De 


velopment of the U.S. Air Force. 


*Rosoarch supported by the O 
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moving from state h to state Yı (i.e., of the letter’s y, being received) is Any, Ms H). 
After the movement the channel is in state Yı: and the procedure is repeated, this time 
starting from state y, and using Dz., because x, is the second letter sent. Successive 
movements take place in the same manner. The right member of (1.1) shows that, 
for given Y. 1(%); +++: Yn(U), the conditional distribution of Yi) depends only 
on Y,,(w9) and the (m+1)-st element Tm+1 OF Ug. 


A code (n, N, A) for an indecomposable channel is a system 


Kai, Ay), ce: (uy, An) wes: (12) 


where u,,..., wy are transmitted n-sequences, A, ..., Ay are disjoint sets of received 
n-sequences, and 


P{v(u;)e A; |h} > (1.3) 


1-A, 
b= EMI N. 


DOKET) 


The reason for the name “code” and the use to which a code can be put are obvious. 


(For more about codes see, for example, Wolfowitz, (1961)). Nis called the length of 
the code, 4 is called the probability of error, and n is the length of each word. 


A stochastic matrix is called indecomposable if, in the terminology of Doob 
(1953, p. 179), it contains only one ergodic class, or, in the terminology of Feller 
(1957, p. 355), it contains at most one closed set of states other than the set of all 
states. It is called aperiodic (ibid.) if it has period 1. Let D bea stochastic inde- 
composable aperiodic (SIA) matrix. Then it is proved in books on Markov chains 
[e.g., (Doob, 1953) and (Feller, 1957)] that D” approaches, as noo, a stochastic 
matrix all of whose rows are identical, and conversely, if D is a stochastic matrix 
such that D” approaches a (stochastic) matrix all of whose rows are identical then 


D is SIA. This property could be used to furnish an alternate definition of an SIA 
matrix. 


A channel is called indecomposable if (and only if) every product of any number 
of the matrices D (with replications permitted, of course) is itself SIA. (Of course, 
then, each D, being itself such a product, must be SIA.) Indecomposable channels 
were introduced by Blackwell, Breiman, and Thomasian (1958). 
a finite algorithm for deciding whether a given finite set of SIA mat 
every product of these matrices is itself STA (see 


Thomasian gave 


rices is such that 
also Wolfowitz (1963)). 


2. Copine THEOREMS 
Hereafter we assume that we are dealing with 
All logarithms will be assumed to be to the b 
information theory which has lost 
what follows an algebraic ex 
to be considered 0, 


an indecomposable channel. 
ase 2; this is due to a convention in 
nee but is innocuous. Whenever in 
appears to be 0 log 0 the latter value is 


all significa 
pression formally 
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Let I be any integer, and let k designate the initial state of the channel. Let 


U= (XG, XG) 
be a chance transmitted /-sequence, i.e., X4, ..., X; are chance variables (not in general 
independent) with values in A, Let Q' be the distribution of U; i.e., if w is any trans- 
mitted Z-sequence then 

Qw) = PU = u). 


Let V be the ehantë I-sequence received when U is transmitted (the initial state of the 
channel is h). Let Q be the joint distribution of (U, V) and Q” be the (marginal) | 
distribution of V. Thus, if v” is any received J-sequence, then 
Qu’, v) = KU Zv, V =v} 
Mo) = PV = v'} == Qw’, v'). 
Q"(v') PV =v} > 


Of course Q’ determines both Q and Q” (for a given channel). Define 


Qu‘, v) (2) 


KG E, QE rE) 


(Clearly, Q and Q” depend upon th.) Finally, define jë 
G(1) = max min IQ, h) (22) 
Qi h 
2.3 
C = sup G(1). (2.3) 
l 


and 
reima asian 
The following theorems were proved by Blackwell, Breiman, and Thome 
(1958); see also Wolfowitz (1961, p. 75). 


Let c> 0 and a, 0< A <1, be arbitrary. For n sufficiently 


Theorem | : 
large there exists a code y 
(n EE) A) 


for the indecomposable channel. 


Dee ë La code 
Theorem 2: For any % ( 


n, N, À) for ihe indecomposable channel satis- 


fies 
n1 
log N< SEX” 


r we will prove 
Let e > 0 and À, 0 < 


I 4 t pape! is 
e A< ile be arbitrary. For all n sufficiently 


Theorem 3 : 


large there does not exist 4 code 


n(C-+e) 
(n, > A) 


for the indecomposable channel. 
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Theorem 2 is a-so-called weak converse. Theorem 3 is a so-called strong 
converse of the coding theorem. ‘Theorems 1 and 3 justify us in calling C the “capa- 
city” of the indecomposable channel (Wolfowitz, 1961, p. 59). 


It seems desirable that the determination of the capacity of the channel should 
be such that it is possible, at least in principle, to compute the capacity to within 
any specified accuracy. The expression (2.3) does not meet this requirement, but 
Theorem 4, to be stated and proved in Section 3, does enable us to do this. 


Proof of Theorem 3: For any stochastic matrix M = (mj, define 
ôM) = max max |m; i m; 
i ùi £ : 
Thus 6(J/) = 0 implies that all rows of M are identical, and 9(47) small implies that 


they are almost identical. Essential to our proof will be the following result, proved 
in Wolfowitz (1963). 


Theorem A: Let My, ..., My be (finite, square) stochastic matrices such that 
the product of any number of M’s (repetition permitted) is SIA. Let 7 > 0 be arbitrary. 
There exists an integer 1() such that every product R of at least (9) Ws satisfies 


HR) En. say (24) 
We now define 
oy) = max |ælogs—z' log a | 
le—a'1—n 
OKIKr <1 
and, for 4 > 0 andta positive integer, let 


(n, t) = by(log a + 2 log b)-+ aa 


; Let the (arbitrary) e of Theorem 3 be given. Let + and t be positive integers 
which will be further described later. We will prove Theorem 3 for all sufficiently 


large n of the form k(r--t), with k an integer. The proof for all sufficiently large 
will then be trivial, E 


Let the system (1.2) be a code (n, N,A) 
ASI. Let vo Say, be any se 
received n-sequences which coi 
the form 


for the indecomposable channel, with 


quence in any one of the A’s, say A;. To A; add all 
neide with v in t 


for all v in some A;, and designate A, afte 
by Aj. It is clear that each sequence i 
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Par = AR i- any transmitted asuun of the code (1.2), and let v(u;) be the. 
“nda eive sequence. Let k < k—1 be any integer, and let t' = k'(r+t)+r. 
en the (t'--1)-st element of u, is being transmitted the channel is in state Fy (5) 
whose distribution depends upon the first t' elements of u; and the state of the sel 
sË the beginning of the transmission of w;. Equivalently, the distribution of Y (uz) 
is determined by Fp 4n (t;) and the r elements of u; which precede the Este i 


Consider now the following channel K: Its input alphabet consists of 
sequences of (r-4-ż) letters from A. Its output alphabet consists of sequences of t 
letters from B. Let c (respectively d) be any element of the input (respectively 
Define Pfo(c) = d} according to the channel K to be 


output) alphabet of channel K. 
ginal 


equal to the probability that, when the (r+t)-sequence c is sent over the ori 
lements of the received (r--t)-sequence are d. 


This probability depends upon the initial state of the channel at the beginning of 
transmission of each letter (of the input alphabet of channel K), i.e., except in the 
Thus the channel 


case of the first letter of a word, upon the previous received letter. ) 
we will regard it as a memoryless channel and leave an 


f the channel probability function. It will be 
shown that, under certain conditions which will prevail, this ambiguity will be of so 
little consequence that the desired result will be achieved. As an example of how a 
memoryless channel might be achieved we give the following: After the 
transmission of any (input) letter of the channel K the channel is considered to 


be restored to a given fixed state. 
at the operation of the original indecomposable channel 


(transmitted and received) as follows: There is given 
If his the initial state of the 
will be j(j = 1, 5 b) be ey 
he indecomposable channel 
prove that 


indecomposable channel, the last ¢ e 


has memory. However, 
ambiguity in the above definition 0 


. Suppose temporarily th 
is changed only for the first letter 
a stochastic bb matrix Z = fyb such that ô(Z) < 7- 

channel let the probability that the first received letter 


After the first letter is received the channel behaves like t : p 
which was originally defined. For this modified channel we will shortly 
; 1 2.5) 

‘pt .ny—min TI ,h) Kel, D ( 

max NQ DT, iQ y 


independent of Z. ë 
Let 7(7) be sufficiently large so that 


t follows, from (2.5), the choice 
is less than 


is true. 
., Dë Then i 
that the capacity of the latter 


Assume temporarily that (2.5) 
aced by Dy + 


(2.4) holds with the M’s rep! 

of r, and the definition of the channel K, : 
baton 9) see (2.6) 
r+t 


nt as in [Wolfowitz (1961), Section 5.4], we obtain 


By essentiall e argume 
s y the sam Ber 
from (2.6) that, for all $ sufficiently large (n= k(r-+t))s | 
pr ke(G(ey elt zm 0) (2.7) 
N<b 
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Now choose 7 so small that z(n, t) < ej, and then t so large that 
rlogb e€ 


r+t 4° 


Theorem 3 then follows from (2.7) when n is of the form k(r--t). The theorem is then 
obvious for general n, as in [Wolfowitz (1961), equation (5.4.6)]. 


Tt remains to prove (2.5). We consider 


E Qiu’, v') log Qw’, v') = F(h), say, vec (238) 


ay 
TA 


for two different initial states, h, and hy. Now Qw’, v') equals Q'(a”) (which does 
not depend on h), multiplied by the conditional probability «)(w’, m) of the first letter 
m (say) of v', multiplied by Ala”, v’). the conditional (upon the two preceding events) 
probability of the sequence q (say) of the last (1—1) letters of v' (which also does not 
depend on h). (Sometimes, when it will make the arguments clearer, we will write 


alu, v') and p(w, m,q).) From the definition of the modified indecomposable 
channel we have that 


Hence le, (w’, e, st me EA 
| F(hy)—F (he) | 
= JE 9), (ws v) log (QW) Aw’, w')-Hog a, (w’, v’) 
= i> KAR (u’, v') flog(Q‘(u') Blu”, v'))--log ty (w, v')]| 
= E [Q"(w') Blu’, 0) loglQ'(u’) Blu, vy] 
+ 2 } [Q u) Bu”, v')] la (u',v') log KA (u',v') 


E ta, (u,v)loga, (u,v)j. ... (2.10) 
2 he 
Also, E 


A Q'(u') Bw, v) 


(2.11) 
= pa zi > lu’) Blu’, m, q) = z z Qw) =b. 
Ps the second term of the right member of (2.10) is not greater than be(7). 
TAP) Bw’, WN log Qw) Plu’, vy) se (212) 
CA 
From (2. 
oë [Wolfowitz (1961), (2.2.4), we obtain that the right member 


blog (ab) < Ib log ab. 
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Thus we have that the right member of (2.10) is less than 


Inb log ad-i-betn). (2.14) 


The expression 
E Qw’, v') log Ow’) (2.15) 
ule 
Q', h) does not depend upon h. Tt remains therefore to consider 


which enters into I( 


Sh) =D Ogu’, vlogo) = BE Mle’, v) oë E Gaels) 


= X (2 Q'(u')anlu”, m) Bu, m, q)) log x Qi(u’) arlu’, m) pw, m, q)). (2.16) 
m,a u” u 
Define T',(v') by l 
i av’) E OW p, v) =E OW, v): (217) 
w u 
Obviously, for all v’ and h, 
” o< To’) <1 (2.18) 
j i LT); 
and, from (2.9) and (2.1 ) Bt 


| Pay(v')— Taste) | <1. 


Thus i } 
Sh) = EDT’) log T(r’) 2 Q'(u) Blu’, M q} 
mi 


5 Qu) Alo’, m0 


+5 E (Tito) È qiw) Bw’, m, q) 108 
be : (2.20) 
=V,(h)+ Yah) (say)- : 
i 61)]; 2.2.4), we obtain that 
From (2.11), (2.19) and NË ës o Ji J LOG eat 
in that 
From (2.11) and (2.18) we obtain “e 
“E VAU < beln). (2.22) 
we obtain (2.5). This completes the proof 


Finally, from (2.14); (2.21) and (2.22) 


of Theorem 3. . 
AcH TO C: THEOREM 4 


Jlest integer such that (2.4) holds with the J's replaced 
mu T ve positive integer. From (2.7) and Theorem 1 we bbn 
by Dy, ..., De Let t be ani 
; r iently large 
that, for n sufficients nan ay oe 


ko) log 
fe ot (n) n) 


i from (3.1) and the definition 


de was arbitrar y. Hence 
nan nce, 
e 


But (3.1) does not involv 
og ptet t) 4G) kë 


of Q(t) we have ae | 
ay <es— Hr) 
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When 7 is sufficiently small and tis (then) sufficiently large the expression 


_ 1*0) log b-+-t2(y, t) Në 
I(q, t) = ao 


can be made as small as desired. ‘Thus we have proved 


(3.3) 


Theorem 4: Let C be the capacity of an indecomposable channel. Let y>0 


and the positive integer t both be arbitrary. Then 


jis — PJ) s (8.4 
Ct) < Jin, t) TËR)” Ga 
For 7 sufficiently small and then t (depending on n) sufficiently 1 


arge the right member 
of (3.4) can be made arbitrarily small. 


Theorem 4 enables us, a 


t least in principle, to compute C to any specified 
approximation. 
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COMBINATORIAL PROPERTIES OF PARTIALLY BALANCED 
DESIGNS AND ASSOCIATION SCHEMES* 


By R. C. BOSE 
University of North Carolina 


1. DEFINITION OF ASSOCIATION SOHEMES AND PARTIALLY BALANOED 
INCOMPLETE BLOCK (PBIB) DESIGNS 

alanced designs were first introduced in 1939 by the author in colla- 

tension of the balance incomplete block (BIB) 

1939] and lattice (Yates, 1936a) designs. 


Partially b 
boration with Nair (1939) as an ex 


[Yates, 1936b; Fisher and Yates, 1938; Bose, 
During the twenty four years which have elapsed since then, much theoretical work 


has been done on the subject and the designs have also been applied to various practical 
problems. As the initial work was done by both the authors when they were members 


of the Indian Statistical Institute founded by Professor P. C. Mahalanobis, it seems 
se designs in this volume. The definition 


particularly appropriate to discuss the 
of PBIB designs was slightly generalized by Nair and Rao (1942), so as to 
include as special cases the cubic an mensional lattices. A further 
step was taken by Bose and Shimamoto (1952) ducing the concept of associa- 
tion schemes, and basing the definition PBIB design on these schemes. Since most of 
tho recent work has been on the combinatorial properties of association schemes and 
PBIB designs based on them, we shall confine ourselves to this topic and leave out the 
analysis of the designs and methods of constructing them. 
Given v treatments, 1, Zor it relation satisfying the fol 


said to be an association scheme with m classes : 

(a) Any two treatments are either Ist, 2nd, .. 
relation of association being symmetrical, i.e., if the treatment œ is the t 
of the treatment A, then £ is the i-th associate of the treatment &. 


dother higher di 
in intro 


lowing conditions is 


., for m-th associates, the 
-th associate 
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(b) Each treatment has n;, i-th associates, the number n; being independent 
of æ. 


(c) If any two treatments are i-th associates then the number of treatments 
which are j-th associates of œ and k-th associates of Pis py, and is independent of 
the pair of i-th associates « and P. 


The numbers 


V, Në, Diz (1,3, & = 1, 2, ...,. m), see (LD) 
are the parameters of the association scheme. 


If we have an association scheme with m classes, then we get a PBIB design 
with r replications and b blocks based on the association scheme, if we can arrange 
the v treatments into b blocks such that 

(i) each block contains k treatments (all different), 


(ii) each treatment is contained in r blocks, 


(iii) if two treatments « and P are i-th associates, then they occur together . 
in A; blocks, the number 4, being independent of the particular pair of i-th associates 
œ and $ (i = 1, 2, sig HU). 

For a PBIB design based on any association scheme, the parameters of the 
scheme may be called parameters of the first kind, and the additional parameters 

br, k, À; (i = 1, 2, ..., m), ve (1.2) 
may be called parameters of the second kind. Clearly 
vr = bk, MAA MAg+ oA Ny Am = r(k—1), ve (1.3) 


2. RELATIONS AMONG THE PARAMETERS OF ASSOCIATION SCHEMES 


By definition the number px, is independent of which pair æ, A of i-th associates 
we start with. Consider the p 


air J, as we see at once that 


Pir = py. s (2) 
The following further relations are easy to prove : 
m A 
a m =v]; wigs (2.2) 
i=1 


2 Pie = nj if ij m (2.3) 
= Mil af t=]; 


i Pir = n; Dh, = Ny py. (2.4) 


These relations were proved by Bose and Nair (1939), in their paper introducing 
the PBIB designs. These are all the relations in case m = 2 but for m > 3 further 


relations were discovered by Bose and Mesner (1959), and will be discussed in a subse- 
quent section. 
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Iti A 

na t is useful to make a convention that each treatment is its own zeroth associ 

te and of no other treatments. Then clearly we must take. an 
2 


2 Pë =l; dr (2.5) 
Pji = Py = 9 if t Aj, 

— Nj if i=j; Sea (2:6) 

pio = Për = 0 if igk, 

=1 if i—k. k CI 
We can now write (2.2) and (2.3) as 

m 

z ny =V (2.8) 

E pi 

a Pir = %G (2.9) 


for i,j,k =0,1,...m. It should also be noted that (2.4) remains valid if one or more 
of i, k, j is zero. 


Also for a PBIB desi 


gn based on the association scheme we must have 


$ nidi; = rk, where A=". (2.10) 


i=0 


OF TWO OLASS ASSOCIATION SCHEMES 


3. A LESS DEMANDING DEFINITION 


The definition given in Section 1, 
the constancy of some of the parameters Ca 
association schemes Bose and Clatworthy (195 
Let there exist a relationship of association 
(a) Any two treatments are either first associates or second asso- 
has n first associates and Na second associates, (e) for any parr 
ciates the number pii of treatments common to the first 
t associates of the second is independent of the pair of 


tion schemes is not minimal, i.e. 


for associa 
n be deduced from the others. For two class 


5) proved the following lemma : 


among v treaimenis 


Lemma 1: 
satisfying the conditions : 
ciates, (b) each treatment 
of treatments which are first asso 
associates of the first, and the firs 
treatments with which we start. 

Then, for every pair of first associates among the v treatments, the members pla: 
Pë and pi, are constants and ple = Pr 
Let there exist 4 relations! 
a) and (b) of Lemma 1, an 
d associate, the number pa of treat } 
d the first associates of the second is independent 


hip of association among Y treatments 
d the condition (e) for any pair of treat- 
ments which are common to the 
of the pair 


Lemma 2 : 
satisfying conditions ( 
ments which are secon 
first associates of the first, an 
of treatments with which we start. 

Then, for every pair of second associates 


8 I 
Dje, pëj and p32 We constants and Pig = P21 


among the v treatments, the numbers 
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One can ask whether one of the preceding lemmas implies the other. The 
answer is no. Consider the association scheme with v = 7, for which the first and 
second associates are shown below : 


treatment first associates second associates 
1 2, 4, 5, 7 3, 6 
2 3, 5, 6, 1 4,7 
3 4, 6, 7, 2 5, 1 
4 8% 1; 8 6, 2 
5 “6; D 2-4 1,3 
6 7, 2, 3, 5 1,4 
7 1, 3, 4, 6 2, 5 


Here n, = 4, n, = 2. For any pair of treatments g, A which are second 
associates pj, is 3 independently of g, f. Hence Lemma 2 is satisfied with Pig = Par 
= 1, p = 0. However, for any two treatments x and P which are first associates 
Ph; is either 1 or 2. 

In view of the Lemmas 1 and 2, the condition (c) in the definition of m class 
association schemes given in Section 1, may be replaced in the special case m = 2, by 
the condition (c”), for any pair of treatments which are i-th associates the number pi, for 
+= 1, 2, of the treatments which are common to the first associates of first and first 
associates of the second is independent of the pair of treatments with which we start. 


For a two class association scheme, the values of the parameters Dix (i, j, k 
= 1, 2), may conveniently be written in the form of two symmetric matrices 


Ph pi pje 2 
P, = (ph) = ( E saa Pa = (pë) =( Pn Pie me (3.1) 
Pa Da): Da Pë 
4. SOME EXAMPLES OF TWO CLASS ASSOCIATION SCHEMES AND PBIB 
DESIGNS BASED ON THEM 


We shall give below some simple examples of two class association schemes, 


and a few designs based on them. This enumeration is for illustrative purposes and is 
not exhaustive. ` 


(a) The group divisible (GD) association scheme. In this case there are mn 
treatments, which are divided into m groups of n treatments each. Two treatments 
belonging to the same group are first associates, and two ‘treatments belonging to 
different groups are second associates. The association scheme can be exhibited by 
writing down the mn treatments in the form of a rectangular array, the treatments 
of the same group occupying the same column. It is readily seen that the parameters 
of the association scheme so obtained are 


v= mn, nj = nl, Na = n(m—1), ww. (4.1) 


fom KA 0 ) | 0 n—l e 4 
0 n(m—1) J, n—l Mm—2)). 
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For example let 
Xa m= 4, n= 8. E r i 
he corresponding GD association scheme is 
1 2 3 4 
5 6 T S 
9 10 11 12. us 
The first associates of th 
a a e treatment 1 are 5 
ge eee B e 5 and 9, and the second associates 
A PBIB design based on the above association scheme for which the para 


meters of the second kind are 
b=9, r=3, k= 4, AS A, = 0, kë (4.4) 


is given below: 
a & 2 3), @ 6 aO 4), 
(6, 11, 12, 1), (11, 10, 9, 8 (10, 12, 3, b), we (£5) 
(12, 9. T 4, (7, TEK (9, 3, 4;. 6): 
e (b) The triangular association scheme. We take an mxm square, and fill-in 
1e m(m—1)/2 positions above the leading diagonal by different treatments, taken 


E> any order. The positions in the leading diagonal are left blank, while positions below 
his diagonal are filled so that the scheme is symmetrical with respect to the diagonal. 
Two treatments ` 


Ty ` i 

a treatments in the same row (or same column) are first associates. 

whi pi i i 
hich do not occur in the same row or same column are second associates. It is 
eadily verified that the parameters of the association scheme so obtained are 


v = m(m—1)/2, nı = 2m—4, 2 (4.6) 


Ny = (m—2)(m—38)/2, 
p= m—2 m—3 Pix 4 2n—8 (4.7) 
A g 2n—8 (m—4)(m—5)/2 5 


m—3 (m —3)(m—4)/2 
angular association scheme. 


ased on the triangular associatio 
the association scheme. 


This scheme is called the tri 


ass of PBIB designs b 
treatments in the s 


n scheme 


An interesting cl 


is formed by taking for blocks, 
e will have 


ame row of 


For these designs W 
bam, r=? k—m—l, Ash Ap = 0. 
As an illustration let us take m = 5. The association scheme is then 
x 1 2 3 4 
1 xx 5 BAN 
z tne oo va (4.8) 
3 6 8 x 10 
10 x 


4 7 9 
ates if they occur together in the same 


where two different treatments are first assocl y $ 
of the above scheme, and second associates otherwise. The 


row (or same column) 
on scheme arè 


parameters of the associati 
v = 10, ny = 4, My = 3. 


3 2 ME? ay n 
r=( i nela i) ‘ey 
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By taking the rows of (4.8) for blocks we get the design with parameters of the 
second kind 


b=5, r=2, k=4, Mal, A=. i (4.11) 
There may be more than one PBIB designs based on the same ao 
: : meg 
For example there exist at least three other designs, besides (4.11), based on the tri 


s kind, 
gular association scheme (4.8). We give below the parameters of the em E e 
(4.12), (4.14) and (4.16) together with the corresponding blocks (4.13), (4.15), (4. 
- for these designs : 
b=10, r=3, k—3, Ay=1, A,=0. a» (4.12) 


(1, 2, 5), (8, 9, 10), (2, 8,3), (7, 5, 9), (9, 4, 2), 


(4.18) 
(5, 6, 8), (3, 10, 4), (0,7, 6), (4, 1, 7), (6, 3, 2). 
fa tb, AS At (414) 
(1, 8, 9, 7, 3), (1l, 8, 4, 10, 5), (8, 4, 6, 7, 2), (4.18) 
(4, 6, 9, 5, 3), (1, 6, 9, 10, 2), (160. 7. 5; 2, 3) 
6=10, r=4, k—4, A=1, Nj—2. ne (4,16) 


(2, 10,6, 7), (10,1, 2, 5), (7, 3, 8, 2), (6, 2,9, 4), (1,9, 10, 8), 


(5, 4,3, 10), (8, 7, 4, 1), (8, 8, 7, 9), (9, 6, 1, 3), (4, 8, 5, 6). 


(4.17) 


(c) The singly linked block (SLB) association scheme. Consider a balanced 
incomplete block (BIB) design D with b treatments, v blocks, X replications, block size 
rand A = 1, ice. every pair of treatments occurs in exactly one block. Then 


bk = or, b—1 = k(r—1), (4.18) 


Consider v new treatments each corresponding to one block of D. Two of these 
new treatments will be called first 


a common treatment and second as 
common treatment. Shrikhande ( 
fies the conditions (a), (b), (e) of 


associates if the corresponding blocks of D have 
sociates if the corresponding blocks of D have no 
1950) has shown that this association rel 


ation satis- 
Section 1 with parameters, j 


EA D a ny = (b r\(r—1)(—1)/r, ... (4.19) 
Sire Ay ae (4.20) 
(—1)(k—r) —1)(k—r)\k—r—1)fr J, 
te r(k—r—1) 
Pe kt) 1V%— ni (4.21) 
E r(k—r—1) 8 U aS) —r(k—r—1)—1 ' 


This association scheme is defined to be an SLB scheme. Every BIB design 
with A = 1 gives rise to such a scheme, 
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Let D* be the dual of the design D, i.e. the blocks and treatments of D* 
pand to the treatments and blocks of D, and a block of D* contains a tr = I oh 
if and only if the corresponding treatment of D, occurs in the e P 
= Then D* is a PBIB design based on the association scheme just eee e 
for which the parameters of the second kind are Kë 
b, 7, k, Ay = 1, Àa —0. nee (4.22) 


These designs are known as singly linked block designs. 
In the special case 7 = 2, when the BIB design D consists of all possible pairs 
of r treatments, the SLB scheme reduces to a triangular scheme with m = k+-1 


(d) The Latin square (L,) association scheme. Consider v = k? treatments 


which may be set forth in a kxk scheme. Thus if k= 4 and the treatments are 


1, 2, ..., 16, we have the scheme 


, 


I 
[o] 
pasja |15 16 


efine two treatments as first associates if they occur 
are scheme, and second associates otherwise. 
be called the L association scheme. The 


For the case 7 = 2, We d 
in the same row or column of the squ 
The association scheme so defined may 


parameters of the L scheme are 
v— lë, m= A-D, 5 (k—1)”, we (4.24) 


ns(i k—l ) r=( 2 a, ve (4.25) 
j= 1. a(k—2) (2). 


k—l -puz 
Kr K kl we take a set of r—2 mutually orthogonal 
For an L, association scheme we then define 
they occur together in the same Tow or column 
bol of one of the Latin 


pond to the same sym 
econd associates. For example if k = 4, 


In the general case 2 


Latin squares (if such a set exists). 


two treatments to be first associates if 
of the square scheme, OF if they corres 
squares. Otherwise We define them to be s 
r= 4 and we take the Latin squares 


[L] [Le] 
1| 2 3| 4 ja) 844 
-ia Ke a 
al 3 sd 
5 si |__| (4.26) 
3| 4 11 2 E ai 
4 3 9 1 2 1 4) 3 
Ltt a e 
aa Së 
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then the first associates of the treatment 7 are 5, 6, 8, 3, 11, 15, 4, 10, 13, i 
12, 14 because the treatment 7 corresponds to the symbol 4 in [L,] and the symbo 
lin [L,]. The parameters of the L, association scheme are given by 


v= k, nj = r(k—1), ng = (k—1Yk—r--1), ve (4.27) 
p, (Dt —Q) Gur) ) uai 
(r—1)\(k—r+1) (k—r\(k—r+1) ), 
SS EA REI) ) vee (4.29) 
i r(k—r) (k—r}+(r—2) J. 


The simplest class of PBIB designs based on the L, schemes are the lattice 
designs, first introduced by Yates (1936a), which are obtained as follows : 


We obtain k blocks each of size k by putting in the same block all treatments 
occurring in the same row of the square scheme. 
tion. Another complete replication is obtained in the same way from columns. Again 
from each of the r—2 mutually orthogonal Latin squares we obtain k blocks giving a 
complete replication by putting all those treatments which correspond to the same 


symbol, in the same block. The parameters of the second kind for the Lattico design 
so obtained are 


This gives us one complete replica- 


b= kr; r, k, =l; À=. we (4.30) 


For example for the case k = 4, r = 4, thel 


attice design based on the square 
scheme (4.23) 


and the Latin squares [L] and [L] in (4.26), is 


Rep 1, (1, 2, 3, 4), (5, 6 7, 8), (9, 10, 11, 12), (13, 14, 15, 16), 
Rep 2, (1, 5, 9, 13), (2, 6, 10, 14), (8, 7, 11, 15), (4, 8, 12, 16), 
Rep 3, (1, 6, 11, 16), (2, 5, 12, 15), (8, 8, 9, 14), (4, 7, 10, 13), 
Rep 4, (1, 7, 12, 14), (2, 8, 11, 13), (3, 5, 10, 16), (4, 6, 9, 15), 


(4.31) 
‘Tf we had chosen only the Latin square IL 
have been given by the first 3 rows of (4.31) 
the fact that there may be designs other th 
we give below another desi: 
of the second kind for t 


1], the design obtained would 
- Let us call this design D}. To illustrate 
an lattices based on an L, association scheme, 
gn based on the same L, scheme as D3. The parameters 
his design are 

b= 16, t= 3, 


SË geg si, ni (4.82) 
and the blocks are 


1% 1), O, 9,19, aa 14), (9, 6, 15), 
(13, 4, 6), (14, dy 7) 
(7, 9, 4), (8, 10, 1), (5, 11 


) 
n o 2 S oe a ee 
(2, 16, 9), (3, 13 
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(e) The cyclic association scheme. Let 
; 3 th È 
integers denoted by 1, 2, ..., v. Let E E E 

dy, dys vesa do i 
i . esa (4: 
be a set of nj integers satisfying the conditions pj 
(i) the d's are all different, and 0 < d; < v (j = 1,2, ..., n4), 


(ii) among the n,(n,— 1) differences d;—d,,, (j, f! = 1, 2, ..., ny j # J’) reduced 
(mod v), each of the numbers dy, da, ..-, dy, occurs g times, whereas each of the numbers 
ej, e X i i 
19 Cs +445 Cn, OCCUTS h times, where d,, do, eo dnp ej, Caj +++, €n, are all the different 


v—l numbers 1, 2, ..., 0—1. 


Clearly it is necessary that 
nyg+tngh = m(n — 1). (4.35) 
Let the first associates of the treatment i be 


itd, it do ...,i+d, (mod v), 


The association thus defined 


(4.36) 


and the remaining treatments be second associates of i. 
is a two class association scheme with parameters V, ni, Na, 


el 
KAT ae ) a (437) 
nj—g—l notat J, 
h nj—h 
pyt” fz ) ve (4.38) 
ny—h ng—njth—l 7. 


For example taking v = 13, and 


d, = 3, dy = 5, 
rences d;—d;' reduced (mod 13) each d occurs twice, 
er less than 13 occurs thrice. Hence the conditions 
tion scheme with parameters 


we find that among the diffe 


whereas every other non-zero integ 


(i), (ii) are satisfied. We thus have a cyclic associa 


v= 13, m= 6, Ng = 6, (4.39) 
pa in A= ` a ni (4.40) 
ai 3 2), 3 2f: 


a PBIB design based on this association scheme, is provided 


An example of 
jd kind of parameters 


by the design with the seco! 
r= 3, k= 3, 
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` and the blocks l nye 
(1, 3, 9) (2, 4, 10), (3, 5, 11), (4, 6, 12) (5, 7, 13), 
pir fa 10, 12, 5), 
2 (8, 10, 3), (9, 11, 4) (10, 12, 
(6, 8, 1), (7, 9, 2), 2 
(4.42) 

(01. 18, 6); (2 1, 7); (8, 2, 8). 


3S ) 'HEMES 
5. ASSOCIATION MATRICES AND THE ALGEBRA OF ASSOCIATION SCHEN 


1 a iation sch Th i-t associati natrix is 
Consider an m class associatio: scheme. he i-th as: ociation r 
defined by 


bi bh oe b in 
B= (B) =| 2. a a es vee (Bs 

dh bj... Dj 

where bë, = 1 if the treatments « and P are i-th associates pje 
— 0 otherwise. aani . 


. i -th row 
Hence B, is a symmetric vx» matrix in which the element in the ate ie 
: eae erwise. 
and /-th column is unity if the treatments g and P are i-th associates and 0 0 
The total of each row and each column in B; is nj. Clearly 


B= I the v i 5.3 
0 v xv identity matrix. see ( ) 
It is also readily seen that 


Bet Bret Ba a of 


v 


(5.4) 


where J, is the vxo m 


' : A p the 
atrix each of whose elements is unity. It is also clear that 
linear form 


Bote B +- “+ CnB, 


m 


(5.5) 
is equal to the zero matrix if and only if 

C = G = em Cn = 0. 
Hence linear functions of Bo, By, ..., By form a vector space with basis Bo, By, «--» Bm: 
Thompson (1954, 1958) 


°8) and independently Mesner (1956) proved the 
fundamental formula 


jë 
BB, = Dh By+ ph, Bip, B, ... (5.6) 


This shows that the product of any two matri 
pressed as a linear combination of te 


(5.5). The set of matrice 
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he linear functi as: i es 0) = 
10NS of the association matric 
TË 1 at ; E “ . 
i Bo, Bs, ey Ba form a linear assocla-_ 


B(B;B;) = (BB) Br, 
one can show that (Bose and Mesner, 1959), 
Eph pl = Eph Phe - (67) 
where u runs from 0 to m and the remaining indices are arbitrary but fixed and 
0 <i, j, k, t <m. 
Now let us define II, by 
Pie Pa oo PO 
Ti, =p =ð Pë 1. Kë 
à (ph) =ð Pik Dit Dik kO, da Ma oe (5.8) 
Pink Pink jag Prix 
Then (5.7) is equivalent to 
Tl, 0, = plot PET HDn eva 


s the association 


(5.9) 


multiply in the same manner a 


Thus the matrices II, given by (5.7) 
(2.7) it follows that if 


matrices B,(k = 0,1, .++5 m). From 


eçllo-res ii “ee Hemlin =0, 


then 04 E 04 QO Ca = 0; 


They thus form the basis for a vector 
tion as well as multiplication. 
matrices of the algebra given 


i.e. Mo, I, ..., II, are linearly independent. 
the same way as the B’s under addi 


(m-+1) x (m-+1) 


In particular 


space and combine in 
They provide a regular representation in 
by the B’s, which are VXU martices. 

Io = dt (5.10) 
s discovered independently by Bose (1955) 


entation wa 
es were studied in a joint paper (1959) by 


further properti 
mentioned later. 

s are commutative. In general they 
In analogy with (5.4), all the ele- 


This regular repres 
and by Mesner (1956), and 
both, Some of these will be 

Since the B’s are commutati 


and are no 
al to nj. Let 


ve, the IP 


are not incidence matrices t symmetric. 

ments of the row j of = My are equ 
k 

p= eoB +B H tonde (5.11) 


r algebra, and let f(A) be a polynomial. Then we can express 


f(B) = I eB 
= gly ollt + onl, 
‘ 119 


be any element of ou 
vs (6.12) 


(5.13) 
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is the representation of B, then 
fU) = doll op UJIT, HIT, ni (5.14) 


Let f(A) be the minimum function of B, i.e. f(A) is the monic polynomial of 
least degree for which 


f(B) 0. 
Similarly let g(A) be the minimum function of II. 
Then f(B) = 9 >h = l} = E dj = 0 (Tl) = 0 


Le. f(A) is divisible by (A). In the same way d(A) is divisible by f(A). Since both 
are monic polynomials 


Hence B and TI have the same distinct characteristic roots, and every matrix B, has utmost 
m-+-1 distinct roots, which are solutions of the minimum equation of TI. In general 


the number of treatments is much larger than the number of classes m, so that the roots 
of B have high multiplicities. 


Since BB, = BB; it follows that 
Il, ll, = Il, I, ves (5.18) 


It is easy to show that the relations (2.4), (2.8), (2.9) among the parameters 


of the association scheme are consequences of (5.15). However, this relation leads to 


new identities when m > 2. For example in the case m = 3, Bose and Mesner (1959) 
show that there is a new identity 


1 1 1 a a, a. 
a (mt =) o Se i ari “URTË Sp sa 
m nn + rs + A + Mg—ng—ng—l 


$ ( G33 Ca 4 G12 Gog -4-223 Ca 
Ng 


g 
Ny Ng Nifas Melts — Nala nn) =0, ... (5.16) 


where dy = ees 2 
12 MiP = NPs, day = Nap}, = MPig, Ay = Ngpës = Napa 


a A 
T = MPs = Nig = Ngpo. 


This identity is not derivable from (2.4), (2.8) (2.9) 


Given a set of positive integral 


satisfying (2.5), (2.6), (2.7), one may ask whether it is poss 
tion scheme with these parameters to exist, h 


provide necessary condition for existence. 


parameters v, n, pi, (i,j, k = 0, 1,..., M) 
ible for an m class associa- 
Then the matrix equations (5.15) 
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6. ComBin 
5. COMBINATORIAL APPLICATIONS OF THE ALGEBRA OF ASSOCIATION MATRICES 


(a) Consider a PBIB desi 
A a gn based on an m class associati i 
association matrices B, defined by (5.1). Let ne ne ee 


N — (ng), 1=1,2,..0 f= 1, 2,..555 is (61) 

be the incidence matrix of the design, i.e. n; = 1 or 0 according as the treatment i 
does or does not occur in the j-th block. Then Kë id st 

B—NN'— r Bot Byte +AmBmns i ate (6.2) 

Is roH A Ia HAm: j a (6.8) 


and for connected designs, i.e. designs 
e, VN’ is irreducible. Also in virtue 
in every row of NN’ is rk. Hence 


The clements of NN” are non-negative 
in which every treatment contrast is estimabl 
of the identity (2.10) the sum of the elements 


(6.4) 


atrix for which the sum of each row is unity). 
and is greater than all the 
also a simple root of II. 


isa stochastic matrix (i.e. an irreducible m 
For such a matrix (Brauer, 1952), unity is a simple root 
other roots, Hence rk is a simple root of B and is, therefore, 


Let 

Pj= rph + mph t + Am Pin: (6.5) 
Then TI = (Pë) 
If 0 is a characteristic root of IT we have 

Po—9 Por Po 

Pro Pu—9 Pim ens 

Po Pu Pmm ~? 


m 
m T | Bi ghe E pi, 
E pyar piht = piek ia a 
i=0 io = 


= ngja: vb tien = rk, 


Hence 
ibs as ng 
Pio pul Pim s0 
(rk—0) p j 
Pmo Pmi Pmm 
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Therefore the characteristic roots of II other than rk, are roots of the matrix 


w= (pë) GjK 1, 2, ..., m3 ni (6.6) 
yere PÄ = Py—Po 
= 164 HAPË... +A Din TA: ve (6.7) 


ô; being the Kronecker delta. 


(b) Fisher (1940) showed that for a balanced incomplete block (BIB) design 
there holds the inequality 


b>» exe (838) 


where bis the number of blocks and vis the number of treatments. (For an alternative 


proof see Bose (1949)). One may ask what the corresponding result is for PBIB 
designs. Now 


rank NN’. 


But the rank of NN’ is v, unless NN’ is singular, i.e., has a zero characteristic root, 
in which case II and therefore II” has a zero characteristic root. T 


b > rank N 
> 


hus: 
A necessary condition for b < vin a PBIB design is 


lpi] =0 ne 
where pi, is given by (6.7). 


Thus Fisher’s inequality b > v is satisfied in general, It can be violated 
by only those designs for which (6.9) is satisfied. This result is due to Nair (1948). 
An alternative proof will be found in Bose (1952). 


(c) The multiplicities of the characteristic roots of B = NN’ can be calculated. 
Let 0, = rk, Ois ..., Om be these characteristic roots with multiplicities ay = 1, Gj, 
Krr 5 Am. Then 6, 0i, ...,6,) are the characteristic roots of II. Remembering that 
the trace of any matrix is equal to the sum of its characteristic roots we have 


Zo Oto OR otc IM — te BE n=0,1,2,.... s (6.10) 


Using the fundamental formula, (5.6), we may express B” in the form 


BP = ein Bot trn Bi H. Honn Bn (6.11) 


Then since B, is the only B with non-zero diagonal elements 


tr B” — Von: (6.12) 
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V i 
m We shall illustrate the determination of the multiplicities æ; by consideri 
e case m= 2. In this case 0, and 0, are the roots of the tintin ae 


r= va Aj Ph HÀ Ppi — A Ajph--Aspie— hA 
) . E) 


Ayphi+Aspe— Nahe 1--Ajpë-t AsP52— Nara 
Using the identities connecting the parameters we get after some computation the 
result that 0,, 03 satisfy the quadratic 


(r—0)P?+ [(Ay—Az) (Pie —pie)—(Aj--Ag)I(—0) 


HA — AAP — Api) Adel = 0. (6.14) 
Setting y=pe— pl ËT PtP» A= y +22+1, (6.15) 
we have 0, = rt HAVA) (6.16) 
Oy = 7—HO—AN VANI: 

Now from (6.10) tr I = lta,to,=% (6.17) 
tr NN' = rk+e, O-pasta = V, (6.18) 

whence after some calculation 
= ma (mn) tyta) (6.19) 

2 24/A 

tyt), we (6.20) 


ny +n NJ—N 
da = = 24 1 


PAVAN 
ties a, and do depend only on the 
ters of the first kind. This is a 


d Mesner (1954) : 


n of the association matrices of 


o note that the multiplici 


Tt is interesting t 
ie, parame 


association scheme, 
g result due to Bose an 


is a linear functio 


parameters of the 
special case of the followin 


If B= CoB erBi t+ tend 


an association scheme, then 


u q. yn 
|B—10| = (09) ® (6,—0) *  On—9) "> (6.21) 
are independent of Cos Cs +++ Cm This remains 
of the roots of NN’ is independent 


where the multiplicities %o Alyy -e Zm Pë 

true if B is given by (6.2). Hence the multiplicity 

of T, Ms Ag e.t) An 

Since the multiplicities &i are expressible in terms of the parameters of the asso- 

e, we cannot have @ set of parameters leading to nonintegral values a; 
the impossibility of certain association schemes. 


ciation schem 
prove 
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We are now in a position to see how Fisher's inequality should be modified 
for the case when II and therefore II” has a zero characteristic root. Let « be the 
multiplicity of the root zero of B = NN’. Then 


b > rank N 
> rank NN’ 
= V—a. 
Hence Fisher's inequality is replaced by 
b> v—a. we (6.22) 
This result is due to Connor and Clatworthy (1954). 
(d) Since the distinct roots 4, 04, ..., Om of NN’ are the roots of II, the 


6s are functions of parameters of the design. Since NN’ is symmetric the 0's are 


all real and positive. This fact yields interesting inequalities among the parameters 
of the design. 


For PBIB designs based on two class association schemes we get the 
inequalities, 
r > HAs—A(VA+y)+(Ay+Ag)], ve (6.28) 
7 > H(Ar—Ag)(VA—y)-+(Ay+Ag)]. ne (6.24) 


(e) We shall illustrate some of the above results with reference to group 


divisible PBIB designs, based on the group divisible (GD) association scheme, defined 
earlier. From (4.2) and (6.15) we have 


y=n-I, f= nl, A=. vi (6.25) 

Hence from (6.16), (6.19), (6.20), using the identities connecting the parameters 
0, = r+n(Ay—Ay)—A, = rk—vë,, Oy = r—Ay. ni (6.26) 
%=m—1, a—v—m. ses (6.27) 


‘We therefore have the following result: 


Group divisible designs can be divide 


(i) Regular 
these Fishers inequal 


d into three classes : 

group divisible designs for which rk—vëg > 0, r—Ay > 0. For’ 
lity b>v holds. 

(ii) Semire 


gular group divisible designs for which r—à > 0, rk—và, = 0. 
For these designs b ‘iia i 


> v—m-+1, 


ka (ii) Singular group divisible designs for which +—A, = 0. For these designs 
m. 


From (6.21) it follows that for a group divisible design 


LENI] — (rk OY —wry— 0-1 pian (828) 
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Hence for a regular symmetrical GD design 


(det N)? = |NN'| = 7°(r2—vA,)"- (r Ayer, ++. (6.29) 


It follows that for a regular symmetrical GD design (i) if m is even, then r—vr, 
is a perfect square; (ii) if m is odd and n is even, then 7—A, must be a perfect square. 

The results given above were first obtained by Bose and Connor (1952), by 
a more direct method. 

We have used the group divisible designs as illustration but it is clear that 
similar results can be proved for PBIB designs based on the other association schemes 
described in Section 4. 


(£) We shall now examine for two class association schemes and PBIB designs, 
the consequences of the integral or non-integral nature of VA, and the equality or 
inequality of the characteristic roots 4, and æa of NN’ (other than the simple root rk). 


These results are due to Connor and Clatworthy (1954). 


The parameters n, and n, are both integers. If n, is zero any two treatments 
will be first associates, and the corresponding design will become a BIB. We shall 
exclude this trivial case and assume that n, and ng are non-zero positive integers. 


We shall set 
7 = (ny — n) FY (tna). ven (6.30) 


(i) If A is not an integral square 4, = Ca. Conversely if o, = 49, A may 
2 h 30 
or may not be an integral square. From (2.2), (6.19), (6.20) and (6.30) 


pn 10 ve (6.81) 


st have 7 = 0, since Gj and &a must be rational. 
a 


i 4 nu 
If A is not an integral square, We 


Hence a, = a. 0, but this does not put any restriction on 


ate ation scheme with parameters (4.39), (4.40). 


sel Haj = Ko : 
Conversely il % oraa 


A. Consider for example the cy¢ 


I . 
n this case kë 


arameters (4.24), (4.25) taking k = 3, we 


ay = ô, 


e with p 


Ogi 


ët mi 
Again for the L association sche 


dë, 
have ay = 4 = 4 A=9 > integral square) 
hi ecessarily the case when A is not an integral squa 
which ws N ta 
i) If a= da (w ee 
a ne 2t ve (6.33) 
m= n= m=0—1)/2 = * E 
së ge daa 


SANKHYA : THE INDIAN JOURNAL OF STATISTICS : SERIES A 


Since a, = a, we have 7 = 0 or 


Ng—N 

2 Ti ns ree a me My 

= eS Sa 2 
Y = Piz naj 


whence y = 0, na = ny, and pl, = pje = t (say). The other results easily follow. 


(iii) Ifv is even, A must be an integral square and n given by (6.30), must be 
a non-zero integer divisible by 4/A. This follows from (6.31) by noting that y) VA 
must be an odd integer. 

(iv) Ifv is odd, then either y = 0, &, = & (in which case (6.32), (6.33), (6.34) 
hold) or 7 is a non-zero even integer, and A is an integral square. This follows 
by noting that from (6.31), 7/24/A must be integral. 


7. PARTIAL GEOMETRIES AND THE CORRESPONDING PBIB DESIGNS 


A partial geometry (r, k, t) is a system of points and lines, and a relation of 
incidence between them satisfying the following axioms: 


Al. Any two distinct points are incident with not more than one line. 

A2. Each point is incident with r lines. 

A3. Each line is incident with % points. 

A4. If the point P is not incident with the line 7, there are exactly ¢ lines 
(t > 1) which are incident with P, and also incident with some point incident with l. 

Clearly 1<t<k, 1 gt<r. 

(a) If there were two distinct lines 1 and m each incident with two distinct 
points P, and Po, then Al would be contradicted. Hence 

A'l. Any two distinct lines are incident with not more than one point. 


Given a partial geometry 
obtained by calling the points of the 
the points of the first. 


(r, k, t), there exists a dual partial geometry (k, t, t), 
first, the lines of the second; and the lines of the second 


The above follows by noting the duality of Al and A'l, the duality of A2 
and A3, and the self-dual nature of A4, l 


For convenience we m. 
if a point 


If two points are incident on 4 
If a point is incident with each 
point. With this language At 


A4. Throu gh 


any point P i 
secting I. Të not lying on 


a line Z, there pass exactly ¢ lines inter- 
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It is easy to show that the number of points v, and the number of lines b in a 


partial geometry (r, k,t) are given by 
v = ki(r—1)[k—1)+t]/t NT) 
b = ri(r— 1)(k—1)--tlit. wee (C2) 


(b) Partial geometries are isomorphic with a class of PBIB designs. Points 


of the geometry may be called treatments and lines of the geometry may be called 


blocks, The relation of incidence now becomes the relation of a treatment being 
contained in a block. Two treatments can be called first associates if they occur 
together in a block and second associates if they do not occur in a block. Using 
the axioms Al to A4 one can now prove that the association relations thus defined, 
satisfy the conditions (a), (b) of Section 1, and the condition (c’) of Section 3. The 
parameters of the two class association scheme thus obtained are 


v= k{(v—1)(k—-1) +4] /t, m = t(k—1), n = (r—1)(k—1)(k—0)/t, as (7.3) 


O penent- (ED ) we 
PN çr—(k—t) (r—1)(k—H(k—t—1)]t 
rt r(k—t—1) 
(7.5) 
P,= ee D 
N aan teë NË) E 


iati Pë siation scheme with 
This association scheme may be called the geometric associat 


characteristics (r, k, t). 
to a PBIB design with parameters of the 


Thus a partial geometry is equivalent 
second kind 
v, b, r, k, ù = 1; À, = 0, PED] 
ven by (7.3); (7.4), (7.5). 
thy (1955) considered two class PBIB designs with 
f Section 6, it follows that for such designs 
cteristic root, if we take Aj = 1, A, = 0. 


the parameters of the association scheme being gl 
(c) Bose and Clatwor' 
reb ASI, =? From the results 0 

j kg dy IE : pe 
the matrix IT” given by (6. 13) has a zero chara 


Hence from (6.14) 
(7.7) 


= r(r—1)- 
d the identities (1.3), (2-1); (2.2), (2.3), (2.4) they showed 
given by (7.3), (7-4), (7.5), (7.6). This raises 
a partial geometry. The answer is in 


e evidently satisfied, it only remains 


rple—("— Piz 


Using this relation, an 
that the parameters of the design ee ajo 
the interesting question, whether ara pa pe! 
the affirmative. Since the axioms Al, 4+ 


to show that A4 is also satisfied. 
127 


SANKHYA : THE INDIAN JOURNAL OF STATISTICS : SERIES A 


Let K be the set of 1: treatments contained in a particular block, and let 
E be the set of remaining v—k treatments. Let g(x) denote the number of treatments 
in E which have exactly æ first associates in K. Then 


È g(x) = v—k = k(k—1)(r—1)/t. s (ËS 


By counting the number of pairs (P, Q), where P is a treatment in K Qisa 
treatment in A, and P and Q are first assciates, we see that 


k 


È x g(x) = k(ny—k+1) = k(r—1)\(k—1). < (7.9) 
1-0 
Similarly by counting the number of triplets (P,, Po, Q) where P,P, is an 


ordered pair of distinct treatments in K , and Q is a treatment in E which is a first asso- 
ciate of both K, and K,, we get 


È a{e—1)gla) = MIN p},—k-++2) = ke-i). ve (7.10) 


z=0 


A simple calculation shows that z, the average value of x, is 


T = Lag(x)/Xg(u) = t, sas WALI) 
and var s= È g(e)(e—t)ë = 0. «n MIL) 
z=0 


Hence x must always have the value t. This is equivalent to the axiom A4. Hence 
a PBIB design with r replications, block size k, à; = 1, Ay = 0, is a partial geometry 
(r, k, t) if r< k. 

(d) One may ask whether a partial geometry (r, k, t) exists for all values of 
Now for the corresponding PBIB design the multiplicity «, of the characteris- 
tic root 0, of the incidence matrix NN” is given by (6.19). 
and A from (6.15), (7.3), (7.4) and (7.5) {we have 


Pot 


Substituting for ny, No, V 


_ rk(r—1)(k—1) 
= a D aa (7.18) 


Hence a necessary condition for the existence of a partial geometry (r, k, t) is 
that the number oy given by (7.13) is a positive integer. For example if r= 3, t= 1 
then the only possible values of % are k = 2, 3, 5 and 11. The cases k= 2, 3, 5 are 


possible, but a rather lengthy combinatorial argument (Bose and Clatworthy, 1955) 
shows the case k= 11 to be impossible. 


(e) We shall now consider Some examples of partial geometries, 
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(i) Consider the lattice designs described in Section 4(d) based on the Latin 
square scheme L,. The parameters of these designs are given by (4.27), (4.28), (4.29) 
(4.30). If we consider the treatments as points and blocks as lines, then ndoh 
Al, A2, A3 are obviously satisfied. Now it follows from the method of derivation 
of the lattice designs from orthogonal Latin squares, that any two blocks of different 
replications have exactly one treatment in common. Suppose there is a treatment 
a and a block B” of the i-th replication not containing æ. Then there will be exactly 
one block in each replication which contains æ. Let By, Bo, ..., Bi ..., B, be these 
blocks. Then B; has no treatment in common with B*, whereas the other blocks 
By, ..., Bi, Bija: ..., B, have each one treatment in common with B”. This shows 
that axiom A4 is satisfied with t — r—l. Hence a lattice design with parameters 
(4.27), (4.28), (4.29), (4.30) is a partial geometry (r, k,r—1). The geometric association 
scheme with characteristics (r, k, r—1) is identical with the L, association scheme. It 
is easy to verify that by putting t = r—l the formulae (7.3), (7.4), (7.5) reduce to 


(4.27), (4.28), (4.29). 
(ii) A BIB design D with b treatments, 
r, and A = Lis clearly a partial geometry (k, 7, r). 
linked block design, based on the SLB scheme corresponding to D is therefore the ' 
dual to the partial geometry (k, r, r). Hence a singly linked 
block design with parameter (4.19), (4.20), (4.21), (4.22) isa partial geometty (r, k, r). 
The geometric association scheme, with characteristics (r, k, 7) is identical with the ae 
association scheme, and in particular the geometric association scheme with E 
(2, m—l, 2) is identical with the triangular association scheme. a is aie ne a y 
that the formulae (7.3), (7-4), (7.5) reduce to (4.19), (4.20), (4.21) by putting t—7, 
and to (4.6), (4.7), by putting t =" = 2, k = m-l. l A ie 
(iii) We shall conclude by giving a rather less pi al e x partia 
an elliptic non-degenerate quadric Q;, in the finite projective space 
ed by straight lines, called generators, but contan no 
and Ray Choudhuri (1959; 1962a) there aro (8+1) 
ach generator contains s+1 


tors in Q; E | 
të ; where s = p”. If P is a point 


ace of P intersects Lin a single 
ing theorems proved 
which intersects J. 


v blocks, Æ replications, block size 
The dual design D*, viz. the singly 


partial geometry (r, k, 7) 


geometry. Consider 
PG(5, p”). This quadric is rul 
plane. As shown by Primrose (1951) 
(s+1) points and (s°-- 1)(s?-+1) genera e 
points, and through each point pass #1 waar 
on Q; not contained in a generator I, then the po = m ee 
point P', and PP” is a generator of Qs. It can be ve oe 
i * is the only generator through P, 
by Ray Chaudhuri that PP' is 


i a lines, we have a partial 
Considering the points and generators pj ate “on was obtained by Ray 
The corresp 
geometry (s?-+1, srl, L) 


onding PBI 
ained earlier by Bose and 
Chaudhuri (1959, 1962), the special case s = 


2 having been obt 
e e configuration of points and generators 
dD 


a partial geometry (s+1, s+1, 1) 
obtained by Clatworthy (1952, 


an show that th 
in PG(4 p’) is 


design was first 


In the same way one € 
quadric Qs 


on a non-degenerate 4 
ponding 


where s = p”. The corres 
1954), 
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8. PSEUDO-GEOMETRIC ASSOCIATION SCHEMES AND UNIQUENESS AND 
EMBEDDING THEOREMS 

A two class association scheme, which has the parameters (7.3), (7.4), (7.5) 

and for which the inequalities 
1<t<r, eigh NA 

are satisfied is defined to be a pseudo-geometric association scheme with characteristics 
(r, k, t). Thus a pseudo-geometric association scheme with characteristics (r, k, t) 
has the same parameters as the geometric association scheme with characteristics 
(r, k, t), viz., the association scheme of a partial geometry (r, k, t). However an asso- 


ciation scheme may be pseudo-geometric without being the association scheme of a 
partial geometry (r, k, t). 


In particular a two class association scheme with characteristics (r, k, r--1) 
may be called a pseudo-L, scheme. 


Tt has the same parameters as an L, association 
scheme, 


Similarly a two class association scheme with characteristics (r, k, 7) may 
be called a pesudo-SLB scheme, and in particular 


a two class association scheme 
with characteristics (2 


:m—l1, 2) may be called a pseudo-triangular scheme. 

(a) A subset of treatments of an association scheme G, any two of which 
are first associates is defined to be a clique of G. When Gis the association scheme of 
a partial geometry there will exist in Ga set X of distinguished cliques Ky, Ko, ..., Kos 
corresponding to the lines of the geometry satisfying the following axioms : 


A*I. Any two treatments of 


G which are first associates are contained in one 
and only one clique of 5. 


A”2, Each treatment of G is cont 


: ained in r cliques of £. 
AS. 


Each clique of © contains k% treatments of G. 


A4, Ifa is a treatment of G not cont 


ained ina clique K; of X, there are 
exactly ¢ treatments_in K, which 


are first associates of g, (i = 1, 2, ...5 D). 


Hence any association scheme @ in which there exists a set E of cliques Ky 


Ka, ..., Ky, Satisfying axioms A*l-A‘d is a geometric association scheme with charac- 
teristics (r, X, t), and we can base on it a PBIB design for which the second kind of 
Parameters are b, r, k, A, = l, Ay = 0, where 8 is given by (7.2). 


One may ‘consider two class association schemes in which there exists a set 
= of cliques BS Raya 


A’3. At 5 > Ko satisfying one or more but not all of the axioms A*1, A*2, 
i ra, and investigate under what additional conditions they will be geometric 
association schemes, with characteristics (r, k, t). Thus the result obtained at the end 


f Secti 
of Sec ion 7(c) may be rephrased as: 7 f there exists in an association scheme G, a set 
= of cliques Ki, Rey vo K,, satisfying 


the axioms Atl, A*2 A'3, and if r < k, then 

Y dë N . è > 

G is a geometric association scheme with characteristics (r, k D i 
, 3 


Boat 
fis gain prs can prove (Bose, 1963) the following theorem: Let A bea pseudo- 
geometric association scheme with characteristics (r, k, t). TE it is possible to find in G 


a seb X of cliques K, Bio, IE isfyi i 
Pt Ay satisfying axioms A*] if r < k, then 
G is a geometric association, Scheme, f aan ii 
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(b) Generalizing a result of Bruck (1963) regarding the pseudo-L, scheme, 
Bose (1962; 1963) proved the following result : 


Let G be a pseudo-geometric scheme with characteristics (r, k, t), for which 
k > plr, t) = Alr(r— 1) +1)(—2r+2)] AS) 


then G is a geometric association scheme. 


In other words if (8.1) is satisfied, then we can find a set of cliques in G satis- 
fying the axioms A'l, A°2, A'3, A'4. Taking these cliques to be lines, and the 
treatments to be points we get a partial geometry (r, k, t). 

(e) Consider the special case r = t = 2, k = m—l of the theorem of sub- 
7. Remembering the results of Section 7(e)(ii), it follows 


section (b). Then p(r, t) = 7. 
that a pseudo-triangulrar association scheme, i.e., an association scheme with parameters 


(4.6), (4.7), must be a triangular association scheme if m> 8. This result is due to 
Connor (1954), who expressed it by saying that the triangular association scheme is 
954), 2 


unique if m > 8. 


Shrikhande (1959a) 
for m = 5, 6; and Chang (1959) and Hoffm: 


proved the uniqueness of the triangular association scheme 
an (1960)-proved the same for m = 7. 


Both Hoffman (1960) and Chang (1960) have shown that for m= 8, the 
parameters (4.6), (4.7) do not completely determine the association scheme. There 
are three athen possible schemes with the same parameters besides the triangular. 


This may be expressed as follows : 


zon-isomorphic two class association schemes (including the trian- 
ur non- 


There are fo 
gular) with parameters 


6 5 P= 4 8 (8.2) 
v— 28, nj = 12, m = l5, P= 5 10 PË 8 6 


th parameters 


pt = Am(m—l), pam, kt E m—2, NE 2... (8.3) 


ve shown that if this design exists then it can be 


Consider a BIB design wi 
v= A(m—1)(m—2)» 


» (1953) ha 
Hall and Connor (1953) 0 à a ha 
embedded in a symmetric BIB design with param i 
v = b = m(m—1)+14, "= ky TM Ao =? ne (8.4) 
all and Connor’s embedding theorem is that 


E “existence of (8.3). For example if m = 10, 

à «plies the non-exis P eh, si 
the non-existence of e i Sohutzenberger (1949) and ëm (19504, 1 950b) 
it was shown independen i ~ 8 qes = 10, Ay = 2 does not exist. This implies 
that the BIB design % = Jo“. yt — 
E BIB design U eo 
the non-existence e aa proof does not cover the case m = 3 for dh Connor 
Hall anë ua a the non-existence of (8.3). Shrikhan s VET, has 
(1951, 1952) separately s a a Hall and Connor theorem, when m Æ 8, using the 

, e Hall a 
proof 0! 


given a very simple 5 


f H: 
One important consequence 0 


36, bY = 45, 7 = 10, W = 10, = 2. 
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uniqueness of the triangular scheme. It is interesting to observe that m = 8, the 
case not covered in Hall and Connor’s entirely different proof is exactly the case when 
the parameters (4.6), (4.7) do not uniquely characterize the scheme as triangular. 


(d) Now consider the special case ¿ = r—l, of the theorem of sub-section 
(b). Then g 
Dr, t) = Ar—1)(73—7? +742). a (8.5) 


In particular if r = 2, P(r, t) = 4. Remembering the results of Section 7(e)(i), it follows 
that a pseudo-L, scheme, i.e., an association scheme with parameters (4.27), (4.28), 
(4.29) is an L, scheme provided that 


k > r—1)(73 7? +-7r+2), vx (88) 


In other words if (8.6) is satisfied, and a two class association scheme with 
parameters (4.27), (4.28), (4.29) exists, then we can arrange the k? treatments in a kx k 
square scheme, and find a corresponding set of r—2 mutually orthogonal Latin squares, 
such that two treatments are first associates if they occur together in the same row 
or column of the square scheme, or correspond to the same symbol of one of the Latin 
squares, and are second associates otherwise. This result may be expressed by say- 
ing that the L, association scheme is unique when r—2 and unique up to type when r > 2; 
if (8.6) holds. In the case r > 2, it is necessary to add the words unique up to type 
because in general there will exist many non-isomorphic sets of r—2 mutually ortho- 
gonal Latin squares. The uniqueness result for the L, scheme is due to Mesner (1956) 
and Shrikhande (1959b), and for the general case L, (in a different language) to Bruck 
(1963). A slightly weaker result for the uniqueness of the L, scheme was proved 


earlier by Mesner (1956), the bound (8.6) due to Bruck being sharper than Mesner's 
bound. 


(e) Again let us consider the special case ¿= r of the theorem of section 
(b). Then 


Dr, t) = $(?7-7?-+r+1), i (8.7) 


Remembering the results of Section T(e)(ii), it follows that a pseudo-SLB 


scheme, i.e., an association scheme with parameters (4.19), (4.20), (4.21) is an SLB 
scheme if f 


k > (r3—r?+r4+1). sa (838) 


), then there will exist a singly linked block design 
D with à = 1), such that two treatments are first asso- 
ne block of D* and second associates if they occur 
result may be expressed by saying that the SEB 
Ype if (8.8) is satisfied. Tt is necessary to add the 
the same Parameters, whoge are pi a al 
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(£) It is now clear that the result of sub-section (b) can be viewed as a genera- 
lized uniqueness theorem. We may say that if there is a two class association scheme 
with parameters (7.3), (7.4), (7.5), then if 


k > Ar(r— 1)-Hi(r-H 1)(2—2r--2)) was, (839) 


then the association scheme is unique up to type, in the sense that there will exist a partial 
geometry (r, k, t) such that two treatments will be first associates if and only if the corres- 
ponding points are incident with the same line of the geometry. 

(g) Itis well known that there cannot exist more than k—1 mutually ortho- 
gonal Latin squares of order k. A set of k—l mutually orthogonal Latin squares of 


order k is called a complete set. 
Given a set S of r—2 mutually orthogonal Latin squares of order k, we 


may define 
(8.10) 


d = (k—1)—(r—2) = k—r+1 
to be the deficiency of the set. If there exist d new Latin squares, such that when 
2 mutually orthogonal squares, all the squares of the 


added to the original set S of r—2 
extended set are orthogonal, then it would be possible to extend the set S to a complete 


set. ; 

Shrikhande (1961) proved that if k A 4, then a set of mutually orthogonal Latin 

squares of order k and deficiency 2 can be extended to a complete set. Bruck (1963) 
generalized this result and showed that if 

k > Md—V)(d8— dt 4 d--2) 


squares of order k and deficiency d can be extended 


(8.11) 


then a set of mutually orthogonal Latin 


sialic alized embedding theorem and derive 


We shall here prove the following gener 
i k. 
from i 4 f Shrikhande and Bruc 
us n scheme @ with the parameters 


Given a two class associatio 


id 1)(k—1) +44 m= (d—1)\(k—1)/t, Ny = d(k—1) (8.12) 
i= = 
N _; dk—t—l) 

C DE E) 1 E më 
P,= dik—t—1) dt 

(a— Ik oE—t— DË (d—1y(k—) ) pn (8.14) 
P= ( a—1t—t) (i—1)\(d—1)+(k—2) 

with r replications, block size k, 


a scheme, ka 
dding new blocks containing the same treat- 


BIB design, with ty = 7--dA,— Àa) 


n this association 


d the design by @ 
ded design 48 @ 
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CLUSTER VALUES OF SEQUENCES OF 
ANALYTIC FUNCTIONS 


By J. L. DOOB 
University of Illinois 


ction meromorphic on the unit dise and let Ti be a subset of the 
In 1933 the author derived relations 


In the 


SUMMARY. Let fn be a fun 


dise perimeter, Let Fa be a suitably defined cluster set of fn on Ty 
n 
and Vn sequences under the restriction that Të was an are. 


between the cluster values of fn 
, to the case in which Da isa 


present paper these results are extended, with necessarily weaker conclusions 


Lobesgue measurable set. 


1. INTRODUCTION 


Let f, be a function meromorphic on the unit disc {z : |z| < 1} and let T,, be 


a subset of the unit disc perimeter, Lebesgue measurable with measure In > 9- 
Let R be the range of the fn Sequence: that is the set of those values taken on by 
infinitely many functions of the sequence. Let Bl fn TË) be shë boundary limit set 
along the I, sequence, that is the set of those values x for which there isa sequence 
of integers a, < da 2 mit a sequence {ans më > 1) with limit a for which Gan 
is a limiting value (cluster value) of fan at some point of Tow. Relations were found 
by the author (Doob, 1933) between R, B} and certain limiting values of the fn sequence 
(those obtained considering f only near T, in a certain sense) jr A SAA 
that T, is an open arc for all n. In this paper analogous results will be obtained in 


the general case. 


and 


the cluster set at a boundary 
+. dise, the local range of the function near 
he function along a measurable subset 
dense at the point. The relations found generalize 

ke by other authors (see Noshiro, 


The results will be applied to find relations between 


point of a function meromorphi 


the point, and the boundary © 
retrically 


T of the disc perimeter 1 
perimeter sti 

“ni i `h T is an arc and W 

previous work (Doob, 1963) in MË më er nr a shi, 

ke a set of capacity 0 or measure 0. The 


1960 for detailed references 
stion exce 


perimeter por 


perimeter point in quë d is classical. 
p ude! : 


case in which only the i 
ar transformations 


ill be in ; 
ts wil formation and if the (fë Du) 


and resul 
hat is, if 4n is such 4 transf 

that 15, LT) sequence, none of the cluster sets or range 
, n n 


variant under line 


All our definitions 
of the unit disc onto itself, 
d by the (fall) 


sequence is replace 
by the change: 


sets will be affected 
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2. QENERALIZED HARMONIC MEASURE 


i sitiv acity and 

If D is an open plane set with boundary D’ of strictly pontine ga 

if T is a plane set, define x (, T, D), the ‘reduite’ of 1 on T in Brelot's ci 
i (1956) E superharmonic function on D which is equal except on a set zero cë > 


i ions ich satisfy the 
to the lower envelope of the class of superharmonic functions on D which satisfy 
following conditions. 


(a) së v>0; 
(b) (2) > lifzisin TO: 


8 z zahi 
(c) 2(z) > 1 if z is in the set which is the intersection with D of some neig 
bourhood of rN D’. 


The function p(-, T; D) is harmonic on the set of inner poris gi, 
The function Elz, +; D) is monotone increasing and countably Aa ee ed i 
Me, T; D) is the outer harmonic measure of T relative to D, at z. H icy z k 
T is a compact subset of D, ut, T; D) is the outer capacitary potential o i . Ke ar 
bability language, if T is a Borel set, n(z, T; D) is the probability of on re eiki 
Brownian paths from z which either meet I before leaving D or meet D i i 

time in a point of I. (If Disa disc, FAND’ need only be Lebesgue measurable. 


F E f zero 

The function 4C, T; D) has value 1 on [AD except possibly an a = to D' 

capacity, and has limit 1 along almost every Brownian path from a poito ngë 

which meets D’ in a point of I. If D is simply connected, the last assertion ei ail 
valent to the assertion that n(, T; D) has limit 1 at almost every (harmonic mea 

on D') point of TQ) D’ on approach in the potential-theoretic fine topology. 

If D is the unit dise, we omit it from the not: 

Let T be a Lebesgue-measur 

positive measure, and let H be th 

constant c < 1. Then each ope 


ation, writing simply (z, I). 


able subset of the unit dise perimeter, of — 
e set of those points z at which nte, T) > c for so i 
n component of H is simply connected and r“ i 
ctly positive measure on its boundary. Moreover u, T; H) > 4 tly 
almost every point z of T some open component of H includes the interior os i 
close to z) of any angle formed by rays from z into the unit disc. This follows fro a 
Fatou’s theorem, TË g is a function on an open component H, of H, bounded we 
regular, g has a non-tangential limit at almost every (Lebesgue measure) point i JE 
T which is the vertex of an angle whose points sufficiently near z, except for z itsell, 
lie in H. Moreover, the is at most the maximum of the supremum 
of its cluster values on t undary of H, in the dise and the aa 
ty limit function on T. (Here ‘essentia 

up to value on a set of measure 0.) end 
ing the standard methods applied in discussing the Lusin- 
r, the same statements except that ‘non- 
because the Lusin-Privaloy-Plessner results 
discussion (Doob, 1961) 
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facts can be derived us 
Privalov and Plessner t 
tangential’ is replaced 
are valid with these ch 


heorems. Moreoy, 


by ‘fine’ are true 
Anges in the usual 
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The following lemma will be fundamental. Let I, be a Lebesgue-measurable 


subset of the unit disc perimeter, with lim |T,| = 27, so that lim z(, T,) = 1 
n ->o 7 n— co KY 


uniformly on every compact subset of the unit disce. Let H,(e) be the open compo- 
nent of the set {z : (z, Ta) > 1—e} containing 0. Since every compact subset of the 
unit disc lies in A (e) for n sufficiently large, depending on the compact subset, ‘the 
following lemma makes sense. 


Lemma 2.1: Under the preceding hypotheses, lim p(+,0,; Hy(e)) = 1 uniformly 
no 


on every compact subset of the unit disc. 
It is sufficient to prove the assertion for a sufficiently sparse subsequence. 
Hence we can and shall suppose that ©,(27—|T,|) < 00. Define T n=) 1B 
Sn 


Then Ty CI3C..., |h] >27 and lim at, Th) = 1 uniformly on every compact 
n= 


subset of the unit dise. It will be convenient to prove the lemma by probability 
methods. Consider those Brownian paths from 0, a path set of probability 2(0, T), 
which meet the unit disc perimeter for the first time in a point of Ti. For each value 
of n, ul, T)— 1 on almost every Brownian path from 0 to T, and we restrict our 


attention to those paths which meet the perimeter for the first time at a point of 
U, I and for which the preceding limit relation is true simultaneously for all sufficiently 
TË n > k the function pe, T) 


large n. The set of these paths has probability 1. 
considered as a function of the Brownian path parameter on the closed interval from 
0 to the time the path first meets I, defining the function as lat the parameter 
interval endpoint, is continuous for sufficiently large n and increases vvith 5 si x 
Hence the convergence to 1 is uniform, and we conclude that the pi pati i i 
lies in H,,(e) (except for the endpoint on T4) for sufficiently lange Kë E y = 
proved that almost every Brownian path from 0 to the dise aa th ae E a 
(except for the endpoint on the perimeter) for sufficiently y y i ea = i ao 
is the probability measure of the set A, of pee Laat awe Seats aa 
oS MET pr e TË T PE ia from the dise center. 
proved is that lim inf A, includes almost all Brownian p af ; 

ari ifor: very compac 

E a E: H.) = 1 at 0 and hence uniformly in every p 

We infer that Jim al , Të Hale) 


arnack inequality), as was to be proved. 


subset of the unit dise (H 
IMIT SETS OF AN (fa LN) SEQUENCE 


3. BouNDARY L Sh 
n fa iS meromorphic on the unit dise and T, is a 
n 


: a ositive Lebesgue measure. In Section 1 
subset of the unit disc mey va pri që along the I’, sequence. Several 
a cluster set B fn ni Let EN fas End} be the set of those values a for 
Kas ana u < ë, ane and a sequence {£an n > 1} with 
ga j lea ase nontangential sequence to some point 
kand gë with vertex at the point, opening into the 
T the angle chosen may depend on n. 


In the following, the functio. 


we defined 
other such sets will no 
which there are a seque! 
limit æ for which Gen i the 
of Pap At each point of Ca i 
dise. If a point lies in severa 


choose 
| boundary sets Ts 
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Zu, is to be the limit 0 
j d except that xu, is À Bed 
ike the cluster set just define fe ie in the specificc 
Aig “NË epee ail int of Tz, where now the sequence is to lie in $ a nae 
fan along a sequence ide eo < D fine BU for Ta) and BU fa, TO) entail y zë f fi 
da pA vë and fae B® xa, is to be the fine limit 8 fan 
+ BI Zan is to be a fine cluster value, a imit. (See Doob (1961) for a 
i. phat of Te, at which this function has a fine ven = five cluster sets 
A 3 s 0 may be empty. E 
discussion of fine cluster values.) The set B y T} = NBAS, T%)} where the 
we have defined are all closed. Finally, define By{(fa, Ty) rij i, Tha eter 
intersection is over all Ty, sequences for which I,CT,, and [Pa e (fr, Tn) from the 
ae denoted without superscripts are defined similarly, and we aon ' ad closed, there 
eas if there is no ambiguity. Since all the cluster sets sie for the other 
j i= By 7, and the same is tr $ 
is a T; sequence making BY{(f,, DY) = Bii(fa: Ta) BDB, DB, DB, DB. The 
four types of cluster sets. Under our definitions, B, : 2 zonë HË 
: : u is tr ause if fn has a nontangentis 
first inclusion is obvious. The second is true e ik B= Be intall vanë 
i is suffici arge, = K 
almost every point of I, when n is përd aig = faxtendiod gland awoii 
if the fa sequence does not have this property By = 3 ni 3 sim Yeahs dah thus Hee 
sner’s theorem. The inclusion B, D By follows from ia ely lt 
oo h it di ct metric space is not o 
cluster set at z of any function from the unit dise to a compa SR a 
for almost every point z of the perimeter, included in the nontangentia keti 
or almos YP i ined using a specified angle a 
at z, but even in the nontangential cluster set obtained usi gas ni a e 
pointz. (The weaker statement is Theorem 4.3 of Doob (1961); the proo oon ws 
actually proves the stronger statement.) The inclusion B; D By may 
fact Constantinescu and Cornea (1960) h 
function on the unit disc which does not 
the perimeter but which has 


ave given an example of a nama 
have a nontangential limit at any A of 
(in our terminology) a fine limit at almost every Ar 
the perimeter. Tf fa is this function for all n, and if P, is arbitrary, = l 
plane, whereas B, can be made properly smaller if T,, is chosen suitably. auf the fh 

The theorems of the present paper are all assertions about the nature 5 a 
Sequence as related to the complement of a boundary limit set of the (fn, Th) që — 
The theorems are true for all ten of the boundary limit sets defined above, bu 


8 : The relation By D B 
strongest with the choice B, for which the proof will be given. TORN 
may be strict. Accordin 


A 's theorem 
g to the fine cluster value generalization of Plesen ae 
(Doob, 1961) either Jn has a fine limit at almost every point of Dy when n is su 
large, in which case 


> lane 
a = B trivially, or not, in which case B, is the extended p 
and B may be empty. l 


4. RELATIVE CLUSTER AN 
Ko <Lewill 
is a strictly increasing se 
(ann > 1) in the unit dise 


D RANGE SETS OF AN (f,, [',) SEQUENCE dh 
s e 
be called an c-cluster value of an (fu, T,) sequence if 


2 ence 
quence (aq, > 1) of positive integers and a sequ 
for which 


Jim fastëo)) ALE inf Mean, To) 
alues will be denoted b 


We also define C(0) = 
only if a,, za, can be found 


(4.1) 
> l—o. — 
The set of o-cluster y 


if 
Y Cn DI), 0) or simply by C(0) 
there is no ambiguity. 


E e and 
N Co). A point æ is in C(0) if at 
o>0 i H 
as above except that the second condition in (4.1) is replace 
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by li z = $ i i 
y : at ans Van) = 1. The set C(0) is obviously closed. As w increases C() increases 


(ide sense). If (4.1) is modified by allowing equality in the second condition, the 
a Clo) so defined is closed and C(c0) C Qo) C Ce) for e >. Thus Clo) an 
” set. We shall prove that the part of this set not in Bi adi 
a a j. 1S O Pë ai Pi ai 
pi ës pen, under certain restric- 

If 0< <1, g will be said to be in the w-range of the (fn, La) sequence if 
there is a strictly increasing sequence faj, n > 1} of positive integers and a sequence 
{zam n > 1} in the unit dise for which 

Fanltan) =a, inf “Zan, Ta) > 1—0. (4.2) 
n 


The «-range will be denoted by Rh, TË): ©) or by R(o) if there is no ambiguity, 


and we also define R(0) = N Ae: As œ increases, R(œ) increases (wide sense), 
o> 


and R(o) C Clo). 
If L, is a linear transformation of the unit disc onto itself, the (f,(Zn); 
as the original sequence. 


LT.) 
sequence has the same o-cluster set and w-range 


5. COHESIVENESS 
a sequence of Lebesgue-measurable subsets of the unit 


Let {an n > 1} be a strictly increasing sequence of integers and let 
ce of points in the unit disc. Let Aj(d) be the subset of the 
TO < oœ < landif the conditions 


(5.1) 


Let (T,,n > 1) be 
disc perimeter. 
Zap n > 1} be a sequen 
perbolie distance less than d from zg. 


unit disc at hy 
inf (Za, Va 1—o, lim inf sup LE, ASH 
in Plans Tan) > Peri stag gë pa 
akly c-cohesive for the za, sequence. 
for which p(z, Dy) > £ and let 
We shall call the T, > 


are satisfied, the I”, sequence will be said to be we 


If 0 < £ < plz, Tp) let He) be the set of values of z 
A,(d, £) be the open component containing % of A,(d) N Hle). 


sequence w-cohesive for the za, sequence if 
—o, lim inf sup 


F ulz, Tan) = 1 a (5.2) 
inf plan Dan) > 1 yoo n zeda (d 1-07 _ 


and sufficiently small. 
akly] o-cohesive for the cluster 
eis [weakly] w-cohesive and 


when w—w’ is strictly positive 


an (f I'n) sequence is [we 


value æ if there is a Zan sequence for which the ins sequence : i: À 
for which fajlzan) > % If in addition to [weak] c-oahësiveness 9” ai ën 
An\ an j 


a set G with the property that to each point A in q there corres- 
1} of fan > 1}, a sequence Gow n > Min the fmit disc, 
sh [20,640,(9)] Zb,6Ab, (9; 1—o’) if o—o”' is strictly 
varies) and fon(2b,) > p, we shall say 
Itaneously for all points of G. 
nts of @ starting from any 


We shall say that 


Zan sequence there is 
ponds a subsequence fbm n > 5 
and a positive number d for whi Aer 
ne M m H 

positive and sufficiently 5 (unitorm $ i DË 
that ti T) sequence ja [weakly] G-souee™ n ë 

nat the ts n) seq ohesive for all poi 
The sequence will then be [weakly] @-con® 


point of G, not just from &. 


mall 
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If L, is a linear transformation of the unit dise onto itself, the T, rë sa 
[weakly] o-cohesive for the za, sequence if and only if the L,(I”,) sequence i for the 
L,(2a,) sequence. Thus it will usually be possible to assume that Za, = 0 for all n: 
In this case the w-cohesiveness condition becomes 


inf (Da) > 1—o, lim inf sup” (z, Lan) = 1 e (5.8) 
n m1 n 


where the primed supremum is for z in the open component containing ne 
e: lej < NNHal—o') and the relation is to be true for o—o' strictly positive 
and sufficiently small. 

It will be useful to reformulate (5.3) in a different form. The sequence 
tul, Pan), n > 1) of harmonic functions is compact in the sense that every subsequence 
contains a further subsequence which is locally uniformly convergent. Let u be any 
limit function of a subsequence and define H(e) as the open component of {z : w(z) > £} 
containing 0. Then the second condition of (5.3) is equivalent to the condition that 
u have supremum 1 on H(1—o) (for every limit function u). If H(1—o) here is inter- 


preted as the unit disc itself, the condition just formul 


ated becomes that for weak 
«-cohesiveness. 


Changing each set I’, by a set of measure 0 has no effect on weak w-cohe- 
siveness or o-cohesiveness. If PSH 


++ the T,, sequence is co-cohesive for 
every Zan Sequence with sup |zu,| < 1. 
n 


If the T,, sequence is a sequence of ares, 


it is w-cohesive for every Zan Sequence satisfying the first condition in (5.2). 


6. A RELATION BETWEEN B AND C(0) 


Theorem 6.1: For any (fn, Ty) sequence, B contains the boundary of C(0). 


This theorem was proved by Doob (1933) with BY instead of B, and the I, 
Sequence a sequence of arcs, and the following proof applies the same method. Let 
a be a point of C(0) not in B. We shall prove that then some neighbourhood of æ 
also lies in C(0). Going to a subsequence and making linear transformations of the 
dise onto itself if necessary, we can suppose that f,(0) ><, [T |— 27. Ifthe fp, sequence 
1s not normal, the sequence is not normal at some point of the unit disc, Then every 
value except possibly two in the extended plane is t 


of the f, sequence in an arbitrary neighbourhood of the point of non-normality. It 
follows that C(0) is the extended 


this case. Tf the f, 
ee eT p . Finally, suppose that f, > % 
3d be fE PAE » replacing f, by l/f, otherwise. = 
distance > 2d from B Let H If 2 is a point not in C(0), with |e—f| < d, p is a 
Then H a pis në) be the subset of the unit disc on which pt, P,)>1— E: 

n(e) contains the ongin if € > 0 and nis large. If for every € > 0 there is 
E Sequence (ay, z'an), n > 1} for which US Gr <..., 2'a,¢Ha(), and dalta) Ps 
then feO(0), contrary to hypothesis, Hence there us ha a £ and a K Di mish, 


for sufficiently large n, 
lfn—2 | SK, on He). 
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But then, if 2 is sufficiently large, 
BI Su, Ty He) 2d-- KUL—nt, TË: Hy(e))) on H,(€) (6.1) 


because the left side of (6.1) is a bounded i i 

nuous boundary values ae by ae ee ae oe Hes 
ried pints of He) në the unit disc, and with fine limits (as well a pe 
its) existing and dominated by those of the harmonic function on the right at 
gn all points of T, (see Section 2). When n—o00 the bracketed expression in (6.1) 
goes to 0, according to Lemma 2.1, at each point of the disc. Hence, setting the argu- 
ment equal to 0 in (6.1) this inequality yields, when n—300, ja—B|+ < 1/24 ei 
contradicts the conditions imposed on A. There can therefore be no such point A: 


some neighbourhood of x lies in C(0), as was to be proved. 


7. RELATIONS BETWEEN B AND C(0) 
Tn the following, the complement of a set X will be denoted by xë 
Theorem 7.1: If the ( fas T'A) sequence is w-cohesive for the cluster value a in 


B, then O(c) contains a neighbourhood of æ and, if ac0(0), the (fas Pn) sequence is even 


simultaneously to-cohesive for a neighbourhood of a. 


je set of w-cluster values in B for which the (fi, Du) 
of the theorem is to the case 
). This is true for example, 
the case treated by Doob 


According to this theorem, tl 
en. The simplest application 


hesive for every value in C(o 
en D, is an are for all n, 


sequence is w-cohesive is OP 
when the (f,, Tp) sequence is 0-00 
irrespective of the f, sequences wh 
(1933). In this case C(0) NË is open. 

quence of Theorem 6.1 if ae0(0). Hence we 


Theorem 7.1 is a trivial conse E Replacing th (fas Ta) sequence b 
shall ex te ease in the following pro” eplacing the (fy Ta) seamen’® “YT 
all exclude this case in t f the unit disc onto itself, if necessary, 


: Soe Ie r mations 0 
subsequence and making linear transforma 
4 i uence 0, 0,... and tha 
we can assume that the Tn sequence 15 o-cohesive for Terai Gat <a 
a . = sup «= 1. (7.1) 
i š e TË al , Dy) = 
inf |T 1—o, lim FA) , H 
a | nl = t popes ngj 
e to u is locally uniform, and 


the convergent 


n of the fa sequence to H is normal, and if 


the restriction of the 


Here u is harmonic in the unit disc, po 
H = fe: u(z) > 1—9} (a) TË the restrictio 
ae ‘ ë on 
some limit function of the sequence ‘hood of 0 has some nei 


sequence to the Pi 


limi , Ta) 
as limit set. Hence the (fn TË the restrictio 


tanei E fa. (b) 
of a neighbourhood 0 qa) because of the las 


locally uniformly to # x i r 
5 Finally, if the të 
treated and excluded this case. (c) Fina y lity in H 
a a ai int of non-normal y 
hat the (Fie Ta) 


is not normal, there is 

i nd in fact t 
Clo) is the extended plane, an 
eee al the points of the extended plane. 
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t relation in (7.1). 
striction of the fi sequence to H 
and we deduce at once that 


sequence is simultaneously 


«-cohesive for 
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Theorem 7.2: If the (fi, TË) sequence is weakly w-cohesive for the cluster 


value a in Band if Clo) does not contain a neighbourhood of æ, then Olo’) is ne extended 
plane for some o' <1, and in fact the (fus Tn) sequence is simultaneously weakly cs” -cohesive 
for all the points of the extended plane. , 

The proof follows that of Theorem 7.1. Again we can exclude the case xe0(0) 
because the theorem becomes trivial in that case. We follow through steps (a) and (b) 
of the preceding proof, interpreting H as the unit disc. These steps need no change. 
In step (c) we can only conclude under the present hypotheses that, if 2’ is a point 
of non-normality of the fa Sequence on the unit disce and if co” is so large that 
inf gf”, T,) > 1—o’, the (fus Pa) sequence is simultaneously weakly w-cohesive for 
n 


every point of the extended plane. 


We observe that in both this and the previous theorem there is a uniformity 
of the simultaneous cohesiveness in the sense that (see the definition in Section 5) 
the number d can be chosen the same for all points of the set G in question. 


8. THE RANGE OF A MEROMORPHIC FUNCTION 


Theorem 8.1: Let f be a meromorphic function on the unit disc and let T be 

a Lebesque-measurable perimeter set. Let D bean open set of the extended plane with the 

property that for some subset T' of T with |T—T" | = 0, no point of D is the fine limit of 
fat a point of T'. Let G be the subset of D not in the range of f. Then 

MFE, G; D) < 1—nte,T) if fe) eD—@. s (81) 

This theorem is well known in various forms, and we shall therefore only 

sketch a proof. We observe that if G has zero capacity (8.1) becomes trivial because 

the left side vanishes, whereas if G has strictly positive capacity f takes the unit disc 

into a hyperbolic Riemann surface and so has a fine limit at almost every point of the 

dise perimeter (sec Doob (1961) or, in a different terminology, Constantinescu and 

Cornea (1960)). Thus in the only interesting case f has a fine limit at almost every 

point of D, 


We can assume in proving (8.1) that z is chosen with f'e) £0. Suppose 


that D is as stated, (For the following type of reasoning see Doob (1961)). Then almost 
all those Brownian paths from f(z) leaving D first in a point of G must have as inverse 
images under f Brownian paths from z to points of the disc perimeter in Î, except 
that the latter paths may correspond only to initial segments of the former ones. 


ath i seat of (8.1), in view of the remarks in Section 2 on Brownian 
paths. 


Theorem 8.2: p 


i or every strictly positive | strictly positive r < 1 
there is an absolute cons = Poe eae ately 7 


tant Sly, r), where lim My, 7) = 0, with the property that 
if fis meromorphic on the unit di P 


sc, if f(0) = 0, and of at almost every point —* 
u haptic upon ofa 
si fe “er Tag the disc perimeter where f has a fine limit the limit has modulus 
2 sy Men the capacity relative to the unit dise o Ra 
th t of Tas } not in th 
range of f is SITI, 7). f the subset G, of te: |z| <1} n 
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Taki i r haa : 
an aking D in Theorem 8.1 as the unit disc and applying that theorem we find 
10, Q; D) < (0, 4; D) K1-ITI. w (8:2) 


This inequality implies the truth of the theorem, because the first term in (8.2) is 
the value at 0 of the capacitary potential of G, relative to the unit disc. 


9. RELATIONS BETWEEN B, C(0), R(0) 


We consider again a function boundary-set sequence {fn TJ and its associated 


cluster and range sets. 


For any (fis Ta) sequence, the set OO) RON) B has capacity 0. 
aced by the in general larger set R. 


Theorem 9.1 : 
We first prove the theorem with R(0) repl 
B. According to Theorem 6.1, if Dy contains a point 
that in the latter case R includes Dy except perhaps 
a finite point of Dy. There are then a function 
n > 1} in the unit dise for which 


(9.1) 


Let D, be an open component of 
of C(0) then Dy C C(0). We show 
for a set of zero capacity. Let be 
subsequence {fan n > 1} and a point sequence (zan 
lim fanltan) = a, 3 GA Tan)] < 0. 

noo bs 
Let B, be the set of cluster values of fa 


aining æ whose closure is a subset of 
all of D except perhaps a set of zero 
Tf Grs È the set of points 


We can suppose that Fean(2an)EDo for all n. 
at the points of P» and let D be an open set conti 
Dy. It will be sufficient to prove that R includes 
capacity, The set BaD is empty if» is sufficiently large. 


of D not in the range of fr and at distance > ò from &, 


MË ‘an(Z 
The function 
to a Harn 


pa, Gans: P) < ctl f@an)s Cans: P) 
j—al < 9/2. Then for large j 


an)» Gans: D) < 1—(Zans Tan): 
D) is harmonic on the disc of 
sa constant c for which 


(9.2) 


U, Ganë: 


au “di r A y 
ccording to Theorem 8 ack theorem there i 


center e, radius, 6, and according 


if n is so large that [fan 
o 
o og TË (9.3) 
pla, Ọ Gene: DISS D Datës cd 
Henco pa, lim sup Gans; D) = 9, 80 that the indi- 


ej. 
all ô > 0, lim sup 


n—>o0 
possibly a set ofzero capacity. 
) replaced by R. To prove 
ly positive number and 


and the sum is near 0 for larg i 
i result is true for 


pacity- Since the 
allof D except 
em with R(0 
let £ bea strict 
he set £: Al T,) > 1—g containing ër. We 
iis Applying linear transformations 
=0,> 1 


ag zero CA 
i i les 
Ga,o has zero capacity: that is, i qer Km 
We have now proved the assertion yal ne 
the theorem as stated, SUP E 
let H,(e) be the oP 
suppose again that n15 ™ ni 
of the unit dise onto itself if neces 
1 


cated limit superior h 


pose that 


that Zan 
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ordins to Lemma 21,if0<r< š 
3 lim pe, Pan; Ha,(€)) = 1, ci 
eae is si nnected. 
uniformly on every compact subset of the unit disc. The set Ha, (€) is se pr ei 
Mapping it one-to-one and conformally onto the unit disc and applying së |: gj 
aes already proved to the transformed function-set sequence we 


i 'bitrary strictly positive 
E> E C(0) N Re) VB has zero capacity. Since £ is an arbitrary strictly p 

0 , 

number, Theorem 9.1 folloves. 


10. RELATIONS BETWEEN B, C(0), R(o) ERE 
Theorem 10.1: J If the (fn, TD) sequence is w-cohesive for the vendin 
in B, then R(co) contains a neighbourhood of « except possibly a a ee te pen a 
Replacing the (fn, Ta) sequence by a subsequence ani ps : r repan 
formations of the unit dise onto itself, if necessary, we Qan assume t = 5 Den 
is w-cohesive for the sequence 0,0,... and that (7.1) is true. Le ie cane 
component of B containing œ. If ceC(0) then DC C(0) and D AN R(0) has 


capacity according to Theorems 6.1 and 9.1. 


i i is case. 
Thus the theorem is true in this ca 
If we exclude this case 


and use the fact that sup wz) = 1, we find that no subsequence 
H 

of the f, sequence c, 

restriction to H of ti 

no limit function ¢ 

normal, there js 

extended plane | 


8 that i£ 
an converge locally uniformly to con H. We Na. Pese 
he fn sequence is normal Ro) includes a neighbourhood of g ee 
an be identically æ. If the restriction to H of the fa sequence 


a point of non-normality in H and we conclude that R(o) is the 
ess at most two points. 

The proof of Theorem 10.1 is 
stronger result. In the first pl 
neighbourhood of g close to an: 


for & (the precise st: 


now complete. Actually we have proved i 
ace we have proved that the J, sequence has range 4 
Y sequence for which the (f,, T 
atement is the analogue for r: 
In the second place we have 


DCO) and then A 
OSog 1 the set of 


i sive 
n) Sequence is w-cohes 
i 3 iveness). 

‘anges of simultaneous «-cohesiv 


ËS ither 
proved that if Dis any open component of B then ei 


(0) ND has zero capacity, or DO C( 
values of æ in D for which the ( Ms 
except possibly for two points. 

xtended plane less at most two pi 


; or 
Theorem 10,2 : Suppose that the (fu Tn) sequence is weakly w-cohesive fi 
the cluster value cin B. I f Rio) 


does not contain a neighbourhood of « there is an w p= 
such that R(o') includes the open component of È containing æ except possibly for a se 
of zero capacity, 
Applying a now-familiar argument, we can 
T is now interpreted to be the unit dise, 
of Theorem 9.1, so we exclude this case below, 
can converge locally uniformly to g, If the fo 
identically %; $0 R(%) ins i 
R(%') is the extended 
location of the point; 


0) is empty and in that pi va 
T,) sequence is w-cohesive “i 
If there is even one exceptiona 
oints. 


assume that (7.1) is true, where 
The theorem is trivial if xeC(0) because 
Then no subsequence of the f, sequence 
Sequence is normal në limit function ta 
ofa. Ifthe fa sequence is not normal, 


st two Points, for some o' < 1, depending on the 
8 of non-normality, 
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ll. APPLICATION TO LOCAL PROPERTIES OF A MEROMORPHIC FUNCTION 


Let f be a function meromorphic on the unit dise, let T be a Lebesgue-measur- 
able perimeter subset and suppose that every neighbourhood of 1 contains a subset of 
T of strictly positive measure. Let B9(f, T) be the boundary limit set of f at the point 
l along T, that is, the set of those values æ for which there is a sequence fan, n > 1} with 


limit æ such that æn is a cluster value of f at some point zn of T and lim z,=1. 
des) 


Define BY(f, I’) in the same way except that ay is a nontangential cluster value, BFD) 
in the same way except that æn is a fine cluster value, Bf, D) in the same way except 
that æn is the fine limit of f at some point zn of T where this fine limit exists. If for 
each point of T an angle is chosen, opening into the unit disc, with vertex àt that point, 
and if aji above is a cluster value of f at z, along a sequence lying in the angle assigned 
to Zn, the boundary cluster set so defined will be denoted by BY. T). The dependence 
on the specified angles is omitted from the notation. Finally, B,(f, T) is defined as 
N BY f, T’) where the intersection is over all Lebesgue-measurable subsets I” of I with 
IP—r) =0. The sets B,(f, T) and so on are defined analogously. These ten sets 
are closed and each set denoted with a superscript reduces to the corresponding one 
denoted without a superscript, for a proper choice of I’. Finally, dropping (f, T) 
from the notation, as we usually shall do when there is no ambiguity, B,D B,D B, 
DB, DB. The results to be discussed here involve the complement of a boundary 
cluster set of f at 1. The results are correct for any of the ten Shales But are 
strongest when the smallest set B is chosen and will be proved with this choice (see 
ti Pr ino discussion in Section 3). 

” ee cesar a value æ will be called an o-cluster value of the pair (f, 1) 
n > 1} of points in the unit disc for which 


at 1 if there is a sequence {Zn, 
lim @ = 1, lim f(2n) = &, inf Hen T) > 1—0. (11.1) 
pan TI be C(eo) if there is no 


The set of c-cluster values will be denoted by C((f. T), ©), or by 


ambiguity, and we also define C(0) sag C(o). A point æ is in E 
as above except that the last condition is replaced by - fo 


(0) if and only if 


there is a zn sequence j Š 
TfT is an arc with 1 at one endpoint, of length < 27, O(0) is 7 më Ergat ticket 
values of f at 1, on approach from the same side as a ps ay tha pene 
values of f at 1 on approach in an 


. T, and a ray to 1 making any angle < no ma this tangent 
ait ta dk eee a 1 the corresponding discussion in Doob (1963)). 


i r tir 03) 
Deo soi kës oe esë ij said to be in the w range of the GE pair if 
egan es yar for which (11.1) is true except that the second condition 
Ae mill be denoted by RË, T), 0), or by R(o) 


The range oe Ar 
we define RO=N, R(0). Obviously lo) C Olo) 


only on the part of T near 1 and that, 
dise onto itself with L(1) = 1 then 
are also true, 


Men,T) = 1. The set C( 


angle formed by 


there is a sequence (ën: 
is replaced by f(én) = % 
if there is no ambiguity, AN 

Tt ig clear that B, Clo), R 
if Lisa linear transformation © 


BY, T) = BY), LI) and the correspon 


(0) depend 


f the unit 
ding relations for R(o) and C() 
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Let {zn n > 1} be a sequence of points in the unit disc wiii limit 1. Let 
An(d) be the subset of D for which the hyperbolic distance from z, is less than d. 
If0<o <1 and if 
inf pa, TD) > 1—o, lim inf sup p(z,T) = 1, axe (12) 
ie dn n 2€4,(d) 
the set I will be said to be weakly w-cohesive for the zn sequence. Let A, (d, £) ba 
the open component containing z, of An(d) N {2 : a(z, D) > e}. The set T will be said 
to be w-cohesive for the z, sequence if 
inf za, T) > 1l—o, lim inf sup nx, D)— 1, eas (11.8) 
% d>n ñ ze4p(d, 1—w’) 


when w—w’ is positive and sufficiently small. 


We shall say that a pair (f, T) is [weakly] w-cohesive for the cluster value 
a if there is a zy sequence for which T is [weakly] w-cohesive and for which /(z,)—> &- 
The definition of simultaneous [weak] w-cohesiveness for the points of a set is then 
carried through as in the analogous discussion in Section 5. 


IfT is an are of the unit disc perimeter, with endpoint 1, in the fourth quadrant, 
and if z,1 below a ray to 1 making an angle <mw with the ray tangent to the peri- 


meter at 1 and going down, the are T is w-cohesive for the Zn sequence. Thus (f, T) 
is o-cohesive for all cluster values in C(o). 


Let fn = f for n > 1 and let T, be the intersection of T' with an open peri- 
meter are containing 1, whose length goes to 0 with 1/n. Then BYS, T) = BY n, Un)}) 
and the corresponding equality holds 
sets and for O(c), Ri 
if and only if some 


(Sns Pn) sequence. 


for the other nine types of boundary cluster 
©). Moreover T is [weakly] «-cohesive for a sequence {zp n > 1) 
subsequence of the z, sequence is [weakly] «-cohesive for the 


We conclude that all the theorems of Sections 6, 7, 9 and 10 on B, C(o) and 
R(co) are true with the definitions of the present section. The individual (f, I’) theorems 
obtained in this way generalize the theorems in Doob (1963) which treat the case in 
which T is an open arc abutting the point 1. Ag ig shown there, stronger conclusions 


can be obtained using B® in that case; the sets of zero capacity in the individual func- 
tion-set version of Theorems 10.1 


3 and 10.2 are empty. The corresponding (fn, Tn) 
results were obtained in Doob (1933). 
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von MISES FUNCTIONALS AND MAXIMUM 
LIKELIHOOD ESTIMATION* 


By G. KALLIANPUR 
Michigan State University 


1. INTRODUCTION 

In 1955, C. R. Rao and the author introduced a class of estimators of a para- 
meter 0 which were Fisher consistent (FC) and which as functionals of distribution 
functions, satisfied suitable regularity conditions such as Fréchet differentiability 
(Kallianpur and Rao, 1955). It was shown that any statistic belonging to this class 
is consistent and asymptotically normally distributed with asymptotic variance 
greater than or equal to [n i(8)]-1, where i(0) is Fisher's information function. 

In view of this result, R. A. Fisher’s definition of efficiency (for a recent dis- 
cussion of this and related concepts see Rao, 1952) becomes meaningful or justifiable 
at least so far as the class of FC, Fréchet differentiable estimators is concerned. How- 
ever, such an approach can be considered useful or interesting provided this class is 
general enough and if it can be shown that there are estimators belonging to this class 
that are efficient. In particular, it would be desirable to prove that the likelihood 
estimator is a member of this class (and hence efficient with respect to this class). 
1957) was able to show this to be the case when 0 
is a parameter in a multinomial distribution, the question in general, remained un- 


answered in the joint paper by the author and Rao (1955). The aunors of aes paper 
onable set of assumptions (on the density function in the 


were not able, under any reas' 
aia : in the infinite discrete case), to prove the 


continuous and the probability function i E 
Fréchet differentiability of the ML estimator. Apparently, at the root of the difficulty 


is the fact that Fréchet differentiability is too severe a pi when dealing 
with the “infinite” dimensional (i.e. non-multinomial) situation. U 4 

Tn this article we propose to examine the problem afresh TERT mee 
differentiability by a weaker analytical concept sa afer (alin ets ks 
class of estimators), that of Volterra (or 4, ve Së was introduced 
Gâteaux differentiability). The notion of V-differentiab “ p te E desl weld 
into statistical work by P. von Mises as early në rë E and eran 
known to the authors (Kallianpur and Rao, 1955) who, A > 


Suitability at the time. . : A 
iy ti ider Fisher consistent, von Mises functionals of the si 
a onsider : pi rresponding to 
We wha. . =- given in the next section. All the results corr pi g 
whose precise definition 18 81 for estimators belonging to the 


pyet derived also 
those in Kallianpur and ge ra cr paper will be to prove that under suitable 
5 ern 0 a 
new class. But the main cone 


by the 


Although, in a later paper Rao ( 
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on the occasion of his 70th pirthday- 
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conditions, the ML estimator is a FC, von Mises functional of the second vë Di 
the class of FC, von Mises functionals of the second order pORdje — f is ee 
mentioned earlier, which the class of FC, Fréchet differentiable nj ra 
been shown to have. Although the conditions under which the } see nec 
and efficiency of the ML estimator are proved are more stringent than: i pemë 
needed for showing its consistency and asymptotic normality, they are o pa n 
general nature and apply to a reasonably wide class of distributions. Hence no a 7 : i 
has been made to seek refinements in this direction. The interested reader will, 
doubt, be able to improve upon them. 


Finally, the closing section offers some remarks on another concept of efficiency 
recently introduced by Rao (1962) 


2. VON MISES FUNCTIONALS OF THE SECOND ORDER 


The following notation will be adhered to throughout what follows : 
(i) The parameter to be estimated, 0, lies in an open interval Z of the real line; 


à 5 : nt 
(ii) F, stands for the common distribution function of the independen 
observations Xy,..., X„ when 0 is the “true” value of the parameter: 


: i ite 
(iii) Pg is the infinite product measure determined by Fy on the (infinite) 


pë ted 
product of the real line with itself, a generic point of the latter space being denote 
by © = (2, a5, ...). 


(iv) For each n, and o = (2, ta, ...), Fi, ©) denotes the empirical distri- 
bution function when the observed sample is given by (ay, ..., t,). 
` The definition of a von Mises functional of the second order given below is a 
modification of the one adopted in a recent paper by Filippova (1962). 
We shall consider fu 


slight 


netionals T' defined on a subset Sy of the space of all pagën 
bution functions (d.£0). It will be assumed that Fy belongs to Sp for all 0 in Z. 


subset Tp of Sy is said to be starlike at Fy if for every Werp and te [0, 1], the df. 
Fy+(W—F,) ET. 


A functional T is m times differenti 


able at F} in the Volterra sense (orm times 
V-differentiable) relative to Tp, 


starlike at Fy if for k= 1,...,m. 


k 
(a) £ TIF +uW—F 
(b) 


variables 2,, 


5 and 
o)l] exists for every tin 10, 1] and every W in Tm, an 
there exists a functional POP, 


‘++ %] depending on F} and k real 
+ % such that for every Wer 


7: (writing h = W—F,) 
dh 

TF TRH =al TT ty, vezaii] dh(x,)-+-dh(a,) 
Here, as in the so 


quel, all integrations 
of the variables. 


The functional PO: dë, 
of (a, ..., 2) and is called the % 


i ; ition 
are over the entire domain of definiti 


sa x] is assumed to be a Borel function 
-th V-derivative of T at Fy 


a position to introduce our basic definition. 
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We are now in 


KË UC 
2 -1 
E CE. 


= Ry 


MAXIMUM LIKELIHOOD ESTIMATION 


Definition 1: T is a von Mises function 

3 a al of the second order at Fyi - 

lowing three conditions are fulfilled : te 
(i) There exists a set 7,(0) starlike at F, such that 


lim Polo : Fat ts 
Him Pyle : Fit. oerg(@)) = 1; 


(ii) T is twice V-differentiable at F, relative to T2(6) ; 
(iii) for every e > 0, 


lim Polo : vèd (o) > 6} = 0, 
n—> 0 


2 


where d,(o) = sup ja TIFOÇ, ©)] 
o<t<i| dë 


FO, o) = Fyt) Hira, OF): 

and Pë is the outer measure corresponding to Py 

Condition (iii) of the definition is stated in terms of outer measure since this is 
all that is needed for our purpose and to avoid any discussion of the measurability 
d,(@). Observe that d,(o) is assumed and defined only for cin the set M, = (0: Fy 
(+, @)e7p(9)) so that the set displayed in (ii) is necessarily a subset of Mar. Although 
TUF C, o)) is defined only for weM p,p its definition can be formally completed for all 
by setting it equal to a constant when œ ¢ Mn. Since 
C, @)] on Mr do not in any way affect 
terested. It will be assumed that the 


© in many ways (for instance, 
Pall, n) tends to zero, the values of TI, 
the asymptotic results in which we are in 
TIF, @)] thus obtained is, for each n, a mea 


(iv) TË Pon erg(0) for all q sufficiently sm 


parameter) then 


surable co -function. 
all (0 being the true value of the 


sup Ea TER gH e Fo) = 0(y) as 9 > 0. 
ogte1) dë PS 
The conditions to be imposed on F, will ensure that the ee part of (iv) is always 
fulfilled so that essentially (iv) will be a restriction on the functional 2 TERP 

' -ariables have a common probability ensity 
We shall suppose that the random vari pd 


ed o-finite measure A. The regularity 

Fe, 0) with respect to some fixed o-fini 
. i dise s will be obtained are naturally stronger than those needed to prove 
eap ea asymptotic normality. The following assumptions will be 


only consistency and $ 
f this paper. 
a throughout the rest 0 i t 
i Ka ially the conditions assumed by Cramer (1956). For every 
are essentially U SË ee tdi 
f A measure zero, the derivatives at (i = 1, 2, 3) 
of here G; are integrable with respect 
i tor a ge JES. K Gi whe A 
ery 0 in I. For any 00 
a log f | < H for all 0 in I w 
oe | 


(A) These 


t except at most on 4 set o 


exist for ev 
en here BH < K < œ, K being 


while 


to A over (—o, 00), 
2 


alee!) ZO 


independent of 9. 
mi, 0< 4% (“a0 


For every 0 
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da | Orë HA. 
(B) Let ale, 0) = log f(x, 0), a,(x, 0) = d në ü= 1, 2)andh(0) = Ey 


(i) For each din I there exists a neighbourhood N of 0 such that Ey/[a,(X, 0)) 
(i = 0, 1, 2) exists for 8'e N. 
(ii) For i= 1, 2, Elax, 0)) < co and EH? < œ for each 0 in J. ; 
(iii) For every 0 in Z, the functions g(x, 0) = (nf (x, O) If (Œ, Cema; 0) 
(7 being positive and tending to zero) are uniformly integrable with respect to Fy. 
Conditions (ii) and (iii) of (B) are somewhat stronger than are actually see Sa 
in the proof of the results. A consequence of (ii) and (iii) is that is a 
E,[a(X, 0)) (i = 1, 2, ) are continuous functions of 0’ in N, differentiable with resy 
to 6’ at 6’ = 0 with derivative equal to Egla, 0) a(X, 0)). 
We recall that a functional T is Fisher consistent if T[F,] = 0 for all 0 in u : 
Now if T is any FC, von Mises functional of the second order (we shall omit 
the phrase “at Fy”), it is easy to verify that 
në {TER e, ©)]—O}—nt f TOÇF,: s)aF,(, ©) Fy] 
< n'd,(o) bar, z (©)+,(@), 


where c A 


Li ee s verges to zero in 
is the characteristic function of Mar and č, converges to ze 
n, T 


probability. From condition (iii) of Definition 1, it follows that the random variable 


on the left side of the above inequality converges to zero in probability. From this, 


së mal 
and a well known result of Khinchin (see Kallianpur and Rao, 1955) on the norma 
convergence of sums of independent 
immediately have our first result. 


Theorem 1: 


A 8 PER E We 
, identically distributed random variables v 


If T is a FO, von Mises functional of the second order, then 
WTF .]—0} is asymptotically normally distributed if and only if f (TOLE: £)? dry) 
is finite. 

Theorem 1 is the analogue of Theorem 3 of Kallianpur and Rao (1955) stated 
there for Fréchot differentiable FC functionals, Let be the class of all measurable, 


FC, von Mises functionals of the second order for which f (TOEP 31)? diy (x) is finite. 


By imposing suitable regularity conditions on F} a lower bound for the asymptotic 
variance of Te Me will be obtained, as in Kallianpur and Rao (1955), which coincides 
with the bound given by Fisher, 


3. Top MAXIMUM LIKELIHOOD FUNOTION 


Now let 0, denote the unknown true valu 
number to be specified later let U; 
(Y;) is contained in I, Lot A,( 
the following conditions : 


AL AND ITS PROPERTIES 

e of 0 and for 6 a sufficiently small 

be the open interval (0,—6, 0+8) whose closure 

9) be the set of all distribution functions W satisfying 
(i) The integ 

functions of 0 in U,: 


(ii) 


rals f a(x, Od Ww (@=0,1,2) exist and arë continuots 


I Paste, OAW] < 8%, Jaçe, Ood W <— ye 
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MAXIMUM LIKELIHOOD ESTIMATION 


where we write 


P= ô log fy” 
Ep (Zp Jo > and [Hee < 2K. 


The set A; (09) i 
o) is clearly non empty si i i i x 
it is easily seen that Fy b pty since it contains Fy,. In view of conditions (B) 
nee at F} belongs to A,(09) for all 0 in some N(69). This fact will be 
T. Now define S; (0) to be the set of all distributions yO =F, (WF, 
where WeA,(0) and 0 < i i në të 
in PG po o) a KEK 1. Itis easy to verify that S,(09) is starlike at Fy. For 
E a së “de V)— Jade, OjAV exists and defines, for every Oin U;,a factional 
respec i f ; 5 y the definition of A;(9), ġ'(0, V) (primes denote differentiations with 
ae ) Et and is given by fate, 0)4V which is continuous in Ọs. Taking 
= V, and applying the finite Taylor expansion to $'(0, V) we write 
p'O, V) = Oo, VO)HO—0o) J ale, OAV +-4h(0—O,)* Hav 
= —k? (0—0) +4, 
K &(2K+ 1)-+4428. From this it is easy to see that o'l — ô, 
< 0 provided 0 < d < MAKI). Since g'(0, V) 


for each V in S,(9)) the equation ¢'(A, vo) =0 
O[V], isa function or functional 


(8.1) 


where |£] < 1andl4:l 
V%)>0 and gotë, V”) 
is continuous in Us, it follows that 
has a rootin U,. This root, which we shall denote by 
of V™ and is defined on S5(00)- Furthermore, again utilizing the definition of As (9) 
and V, we have g”(0, V)) <— yke +4Kô < 0 for all 0 in Us, ô being chosen such 
that 0 < 6 < min AGK ps 1K}. Tt follows that 6[ 7] is the unique root of 
the equation $'(0, yo) =0 in U; as also the unique maximum of g(0 V”) in U;. 

j consistency proof shows 


nt as Cramér’s (1946) 
ds to one as n— %0 and 


ally the same argume 
ability that Fn (+ o) belongs to As (0) ten 

ll ¢ in [0, 1179, Lo: FO, )68s(9o)] 7 1- Observe that whenever 

of the sample coincides on Us with the function 

oot of the likelihood equation lying in U, for 
al maximum (the existence of such a root is a consequence ' 
of a combination of arguments of Cramér (1946) and Huzurbazar (1948) and need not 
be repeated here) is indeed given by O[Fal-> ço) where dis the functional defined abore. 
Wo shall refer to Ê as the maximum likelihood functional (MLF) although, as is well 
known, O[Fx(:, ©)] does not necessarily make the likelihood an absolute maximum. 
e shall write Ta(00) instead 


In accordance with the notation of the preceding section W 


of 8,(0p)- 


The proof that the ML 


First we show 
a functional 


The ML function 
second 01 
09) respectively. 


Essenti 1 


that the Po, Prob 
hence that for a 
Fi, o)e S,(05), the likelihood function 
#(0, F*). Tt follows that the unique r 


which the likelihood is & loc 


F 6 isa von Mises functional of the second order consists 
its Fisher consistency and derive some of its analytical 


defined on Tlo): 


16 is FO and is tt 
der y-derivatives of 


of two parts. 


properties r! arded as 
stil vice V-differentiable relative 


i dat Fy, are given by 


Theorem 2 : 
to the set TË (0o). The first and 
k- a,v, 09) and oktan Po) aalto 
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0 I T; O) as 1}(0)= BH. 
(B) Let a(x, 0) = log f(x, 0), a(x, 0) = KB Pn 8) (i= 1, 2)andh(0) 0 


7 ata: (X, 0)] 
(i) For each 0 in J there exists a neighbourhood N of @ such that Eglail 
(i = 0, 1, 2) exists for 6’e N. 


(i) For ¢=1,2, E,lai(X, 0) < co and EH? < œ for each 0 si F. mper 
(iii) For every 0 in I, the functions g (x, 0) = (nf (x, MAJ, E i 
( being positive and tending to zero) are uniformly integrable with respec ae 
Conditions (ii) and (iii) of (B) are somewhat stronger than are E mora 
in the proof of the results. A consequence of (ii) and (iii) is oats pope 
Eyla(X, 9)] (i = 1, 2, ) are continuous functions of 9’ in N, differentiable wit 
to 0’ at 6’ = 9 with derivative equal to E,[a(X, 9) a(X, 0). 


. se mm g 0 in I. 
We recall that a functional 7 is Fisher consistent if TYF o] = 0 for all 71 


8 mit 
Now if T is any FC, von Mises functional of the second order (we shall o 
the phrase “at Fa”), it is easy to verify that 


METRO} —at | PULP): a) AEE, o)—Fyl| 
S nid,(o) Sy, (0) člo), 
nT 


where ¢,, is the char 


nT 


probability. From condition (iii) of Definition I. 


on the left side of the above inequality converges to zero in probability. From “i 
and a well known result of Khinchin (see Kallianpur and Rao, 1955) on thie nae pn 
convergence of sums of independent, identically distributed random variables 
immediately have our first result. 


o in 
isti i rerses zero 1 
acteristic function of M,r and é, converges to 


rariable 
it follows that the random varia 


Theorem 1: J f T is 


nT .|—0} is asymptotically y 
is finite, 


a FC, von Mises 


“der, then 
functional of the second order, 
rormally distributed 


if and only if f (TF); a)? AF) 


ated 

m 3 of Kallianpur and Rao (1955) a. 
iable FC functionals, Let 4 be the class of all wea njërit 
8 of the second order for which J (TOL sa)? al | a ia 
arity conditions on P 


otic 
9 4 lower bound for the ene 
j btained, as in Kallianpur and Rao (1955), which coin 
with the bound given by Fisher, 


8. T 


E, 
e of 0 and for $ a sufficiently aot 
tet A Pen interval (9-8, 0-1-8) whose a 
$ © z . . 7 i 
the following conditions : a09) be the set of all distribution functions W satisty 

(i) The integrals J aa, Oaw (i 
Functions of g in G, ; 

(ii) 


(U) is contained in 7 


‘nuous 
exist and are continuot 


* Saxe, 0a Kg 


i 
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MAXIMUM LIKELIHOOD ESTIMATION 


where we write 


Te ô log f 2 
f p= By ( a iB , and [H(x)dW < 2K. 
“he set A i l 
; As (Oo) is clearly n i i i i ; 
i sack ie i ly non empty since it contains Fg . In view of conditions (B) 
hit ayaan hat Fy belongs to Aj(ëg) for all 0 in some (Øs). This fact will be j 
kë ater. Now define S; (05) to be the set of all distributions VO = Fy + W—F, ) 
here We a 
a 2 P eA (GJ and0<t <1. Tt is easy to verify that S:(09) is starlike at KË For 
pi Bo S:(09), (0, V) = S aol: 6)dV exists and defines, for every 0 in Us, af obona 
he y 5. By the definition of As(0o): g'(6, V) (primes denote differentiations with 
esi o 0) exists and is given by far. 0d V which is continuous in U,. Taking 
= V, and applying the finite Taylor expansion to g'(0, VO) we write 
p'O, V0) = 6 Oy VOO) J ale yd +4p(0—8,)* HAV 


— —140—05)+ Ar, 
2K +1)+- 3°. From this it is easy to see that ¢’(A)—6, 
HeBK LI), Since ġ'(0, V®) 


(8.1) 


pë IA) <1and|A,| Kë 
)X 0 and (+o, VO) <9 provided 0 KË < 
VO in 8,(4) the equation g'(0, V) = 0 


at for each 
we shall denote by 
again 


O[ VF], isa function or functional 
utilizing the definition of A; (09) 
all ô in Us, ô being chosen such 
i TK). Tt follows that 6[V] is the unique root of 
as also the unique maximum of g(0 V) in Us. 
gë Cramér’s (1946) consistency proof shows 
longs to As (65) tends to one as %—>00 and 
Observe that whenever 


6S(00)] >1- 
e coincides on U; with the function 


is : 5 eH de 
continuous in U, it follows th 


basis tonti š 
as a rootin U,. This root, which 


of V and is defined on Sj(00)- Furthermore, 
) <i? +4K0 < 0 for 


Fon 0 < 8 < min GEBK-HI) 
le equation ¢'(0, V®) = 0 in U, 
Essentially the same argument 

that the P, -probability that Fj (5, 0) Pe 
= FO(, o) 


all ¢ in [0 1)P9, lo: 


hence that for 
tion of the sampl 


Fit, we S ae 
' (65), the likelihood fune eae Ay ARË ni 
50, F a) Tt ‘fellows that the unique root of the likelihood equation lying in U, for 


, À m òf such a root is a consequence i 
which Moli (5 Jocal maximum (the existence 

ie = -o is pë (1946) and Huzurbazar (1948) and need not 

$ ination oi ari 
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be repeated here) i indeed given by OF n+, ©)) mie fis pana i gag 
rae aximum likelihood functional MLF) although, as is well 
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Proof: Setting t = 0 and 0 = Ê in equation (3.1) 
we have (LE, 1— oo )I— +4 A(OLF,,1—6,} HAF] = 0. 


Since 1682, p1—%| < 8 the second factor on the right being less than —12-- Kë cannot 
be zero so that dro, = 04. This proves Fisher consistency. First we show that 
8[V] is a continuous function of ¢ in [0, 1]. In subsequent arguments we shall use 
the expression “for 6 sufficiently small” to mean that 6 has been chosen less than a 
fixed positive number ô which depends on k? and K. That ô can be so determined will 
be apparent from the context. Now writing VO = Fotih (h = W—F,,), a; for 
(x, 99) (i = 1, 2) and letting ¢’ be such that 0 K t-+t’ < 1 it follows from (3.1) after 
some simplification that 
[Oly yery| <S [PC] Sadh f adV—fadhfadV | 
+H(|fadV f Hdh|+|fHdvV f adh|)] 
x [(2—(t-40’) fagdh—40( | [HV +#/[Hdh|)) 
X (k?—tfagdh—48| SHdV|)—}| Jad VO f HAVO |J 
the quantity on the right side being positive for ô sufficiently small, and tending to zero 
as t 0. The assertion is thus proved. Again from (3.1) we have 
POVH] OON fa, dVO fadh fadh fad V” 

HIJAU) Jad VO JHdh— 4 PAÇt) fa,dh$ Hd V O] 

X [aA VOU fa dha PAÇ--tYJHA VO fH dh} 

{fad V+ SBA) [HAV }— 4p fa, d VO [HAVO], 
where A(t) = ÎLVO]— 0.. Since A(t) has been shown to be continuous it follows on 
making ¢/0 that A[V] is differentiable with respect to ¢ and that 
P(t) 


2 
Ri) on 


Lave) = 


where 

P(t) = fad y" fa,dh— fadh fazd VOIA Sad VO [Hdh—fadhfHdV] 
and 

Rt) = (f ad V+ $BA(t) fHAV)2— 4B fad VO fHAV, 
Since Î[V O] and hence A(t) is differentiable with respect to t, so are P(t) and R(t) and 
it follows from (3.2) that > NV W] exists. We have (primes now denoting differen- 


tiations with respect to t) 


È yo Z'O POR) » Që 
ap 17) Rk} ROH oe 


Finally putting t — 0 in (3.2) and (3.3) we obtain 


( ql?) , = Shrtadh, 


tei 


and (i bry 4 
Ga [ A = 2J Sk ajta,, Op) axx, O,) dh(a,) dh). 


The proof of Theorem 2 is thus complete. 
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MAXIMUM LIKELIHOOD ESTIMATION 
It has already been shown above that P, (M, ĝ)> 1 as n—>œ where 
b , 


M,,gis the set {w : Fil, 0) € 7g(0))). This is condition (i) of Definition 1 with T = 8. We 

turn now to the proof that condition (iii) of that definition holds for 6. For every 

positive e there is an integer ng(e, 6) such that the following holds: Py (NM, 2) > l1—e 
a ; 


if n > mp, and whenever cell, 9, for every fixed ¢ in [0, 1], 6[FO(-, @)] is the unique 
root of (0, F®) = 0 which maximizes $(0, FP) in U,. The definition of 6 for all 
© can be completed in such a way that ALFO (-, )] is, for each t, measurable as a func- 
tion of @. To prove (iii) it suffices to assume that n > no and that œ belongs to W,, 5. 


Since F(-, 0) e 73(06) it then follows from Theorem 2 that - 6LFY(-, @)] exists and is 


s P and R given imme- 


given by P’R71—P,,R,R;* where P, and R,, are the function 
Hence it is to be noted 


diately after formula (3.2) except that now h = Fyt, o) =F; 


that P, and R, as well as their derivatives with respect to t are functions of t and o. 


Let us write 
A, = fad@F,, Bi = fadFi, Ci = JH dr, 


A, (t, @) = OLFY(-, ©)]—% and A,(o) = sup l^t, o)|. 
uing argument, but it 


e writing we shall suppress @ throughout the ens 
Cleraly A, < ô 


To simplify th 
will be supposed that œ € M,,5, and that all sets considered are «-sets. 
and further from (3.1) (writing An(t) for A,(t, ©)), 

A,(t) 1A [B H) AAOC nH) gH 
A,<|Aal e |B, +12|—2K0]}* = Ep, say. (3.4) 
m since by the law of 
ty and since 6 is suffi- 


Hence 
Clearly č„ converges to zero in probability by Slutsky’s Manes 
large numbers A, and B,,-+- 4 both converge to zero in proba 


3 ry g e have 
ciently small. Hence for every £' > 9, W 


(An > E)N Mn ô] < Py (Gn => Shean we (8.5) 


Po 
ave for any tin 10, 1] 


nt) and R,,(t) we h 
44, {Cnt SEF) 


Now, from the expressions for P 
ej— HAF, }?— 
[R] > pë— |B, +| JA (0,1 S Hay, 
Zz (k2— |By +P | —2K6)—2K| 4al = Pas SAY, 
E 


Pl < LAKO 


while 
Pkt Jo2 KO E ece (3.6 
ap |20 ESTE ) 
Hence ogtg1 | fn 
155 


SANKHYA : THE INDIAN JOURNAL OF STATISTICS : SERIES A 


l 
We then have for y (hero and in the sequel y is a positive number less than 3) 


i ([ ça zi pes E > ©] Naha) < Poli LAJ Kolosi > e 


which tends to zero as n— co again by Slutsky’s theorem since p,,— (12—2 Ko) (>0) 


in probability and në A, has an asymptotic normal distribution. Finally since 


su Ea 
pee P viz 


PO |y P a(t) Rat) | 


SEa R EN RO RO, 


0Stsi 


ve have 


r ; i 
së N | d? (t) , M gë 
u nt a gh OUR) [>e JN eei 


Sri (ëse ee 1 e’) inn 
f alt) Pi e) Ni (3.7) 
+P Sig aup | am RË) Te) ngë 


It is easily seen that 


Px) =— (3 PA, IHar,). po. 


P(t) 
AC) 


From this we obtain  sup|Pi(t)| < K |A,,]sup 


| Pale) | 


Hence në sup Ray O) <S K(n7|A,| oz! (= sup 
I n 


P(t) 
RA) 


n 


) 


from which it follows that the first term of the right side of inequality (3.7) tends to 
zero. Using (3.5) we obtain after straightforward calculations that 


R(t) 
Sup | B,(i) 


| < [20 | Byte | 49K, | B, + 


+2KE,+ 3K sup 


en jo.) pa 


Since n” |B atl?|, në, 


and n”|A,] converge to zero in probability (0 < y <4) i 
follows from the aber. inequality and (3.7) that 


Pa, I n” sup 


Ra) 
EZO 


> e') Narna] +0 


for every £’ > 0. 
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The second term on the right side of (3.7) is not greater than 


eee 
Ry |r ave Jnana]. 


Since both terms in the above expression approach zero as n—o00 it follows that the 
quantity on the left side of inequality (3.7) tends to zero as n—>00. 

This completes the proof that Ê satisfies condition (iii) of Definition 1. The 
follows without diffculty from (B). Since condition 
with Theorem 2 give us our 


jë y | Rit | E 
Pë, Lë SË Fo pe nata) +P por sup 


verification of condition (iv) 
(i) has already been shown to hold these facts combined 
main result concerning 8. 


Theorem 3: The ML functional 6 is a von Mises functional of the second order. 


4, Evrrormncy oF Ô RELATIVE TO THE OLASS Ml 


as been shown to belong to the class At, the next step is to show 
At. This follows as a consequence of the next result which 
ance of estimators belonging to A. 

) be assumed to hold. If T is an esti- 


Now that Ê h 
that it is efficient relative to 
gives a lower bound to the asymptotic vari 


Theorem 4: Let conditions (A) and (B 


mator belonging to M, then 
ë dlog f\* a 
Nr: aj) dry, @) > U), \ i mi (41) 
'aylor expansion formula and Fisher consistency 


Proof: From the finite T 


of T we obtain the relation 


1 = FOE, i fE Aol Af ot) 


d? 
_ jie, iFa (ae Tot ) 
(y) <1 and h = Fair” 


taum 
Fiy From Condition (B) 


where 7) is sufficiently small, 0 < 


of Definition 1 we have on making 9— 0 


and Condition (iv) 
x, 0 

i= f TOF, pi 2] (aet F(z). ni (42) 

em follows immediately on applying Schwarz’s inequality 
and side of (4.2). 


at this stage to c 
1962). According to 
ie correlation with the deriv 


The conclusion of the theor' 


to the integral on the right hi 
It seems appropriate 
recently introduced by asi 
‘ont if its asympto” on 
E Lila HD pë (4.2) that a statistic pej spe 
P ch 

saly if its asymptotic variance 18 equal to pol”: i 
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omment on a definition of efficiency 
this definition an estimator is said 
ative of the log likelihood 
nt in the new sense if and 
and only if equality holds 
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in (4.1). Hence the new concept of efficiency and efficiency in the sense of Fisher are 
equivalent so far as the class is concerned. 


The results obtained in this paper can be extended to the case when the para- 
meter 0 belongs to an open subset of k-dimensional Euclidean space. Finally, we 
offer a remark in connection with the definition of von Mises functionals of second 
order. Condition (iii) is very similar to the one imposed in the paper by Filippova (1962) 
where asymptotic distributions of second and higher order functionals are studied 
in a different context. Condition (iv) can, if desired, be replaced by a suitable analyti- 
cal condition on 7. For instance, (iv) is satisfied if it is assumed that the second 
V-derivative, T@[V; x,, x] at V'“(Ver,(9)) is bounded uniformly with respect to 
V, x, and x. However, such a sufficient condition would entail additional restric- 
tions on F} if it is to hold for T = Ê. 
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ON 
THE APPROXIMATION OF DISTRIBUTIONS OF SUMS OF 
INDEPENDENT SUMMANDS BY INFINITELY 
DIVISIBLE DISTRIBUTIONS* 


By A. N. KOLMOGOROV 
Academy of Sciences, U.S.S.R. 


Throughout this paper 
= Ej HË Hën 


is a sum of n independent real random variables, 


Pyle) = PE: <a), Hle) = PE <2) 


z 


1 a 
Gol) = — T Za 
a(x) Ak zodu, o > 0 
0 if «<0 
elz) = Q(t) = { Š 
1 otherwise, 7 


v = {D} is the totality of all infinitely divisible distribution functions D(x) 
are positive constants. 
d versions of two theorems, 
1956) will be proved : 

that, in the case of identically 
ists Dev satisfying 


and cy, Cy, «+ 
the weaker forms of 


The following strengthene 
e given in my work (Kolmogorov, 
ant c, such 


which wer 
There exists a const 
n, there ex 


Theorem 1: 
distributed Ép, whatever be F(x) = File), k = 1, 2,.. 
the inequality 

| D(a) —H(x)| < anë eae (00) 


21 > 0 the validity 


for all x. 
Theorem 2: There exists aC, such that, for any £ > 0,L2 
of the inequalities 
: . B(x—l)—E < Fil) < Elx+)+e (0.2) 
for all x and k= 1 Qa Ms implies the existence of a Dev for which 
Die—L)—8 $ H(x) < D(a-+L)+6 (0.3) 
(0.4) 


for all x, where i | tog 2 të evs). 


§ = ĉa MAX (z 


f this problem. 
Institute, Calcutta, in April 1962, 
Statistics presented to Professor P. C. Mahalanobis 


So goes the history 9 


ture delivered 
cluded in Con 
h birthday- 


at the Jndian Statistical 


0 lec x 
* Translation of a le tributions to 


This paper has peen in 
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1. From the closure of the class of all infinitely divisible distributions, intro- 
duced by Bruno de Finetti (1930), under weak convergence it immediately follows 
that in the case when the sum 


be = Ent... Etn se (06) 
lim n; = œ 
t0 


of independent summands which are identically distributed within each series, converges 
weakly the limit distribution is infinitely divisible. 

It was tempting to understand this result thus : sum of a large number of 
identically distributed independent summands has a distribution, approximately 


normal. However, before my work (Kolmogorov, 1956) such an interpretation was 
not fully convincing. 


Even in the case of a sequence of identically distributed summands 


Era Gay axe 


a ‘completely different’ case was possible according to Doeblin (1959), in which under 
no normalisation 


a= A,(&+...+8,)—B,, 
and for no sequence M IN E. EM... 
the distributions of the sums em, can converge to any distribution other than the dege- 


nerate distribution H(a—a). The latter, of course, can be achieved by choosing the 
multipliers A,, sufficiently small. 


Only in 1955 Yu. V. Prohorov proved that, in the case of sequence of 
independent and identically distributed summands m 
of infinitely divisible distribution functions 

D(x), Dyte), ..., D (2), 
approximating the distribution H,(x) of the sums 
En = Erte En 
sup | H,(2)—D,(e)|-> 0, ve (08) 
az 


as nc. Prohorov's work (1954), however, 


there always exists a sequence 


in the sense 


left open the question whether the 


convergence in (0.6) is uniform with respect to the choice of the distribution function 
F(x) of E. 


In terms of the uniform metric 


AF’, F") = sup | F"(e)—F"x)| 
the problem is as follows : do the functions* 
Win) = sup pH, v) 


converge to zero as nyo? My work ( 


Kolmogorov, 1956) gave the answer to this 
question; it was proved that 


Ylin) = O(n—115), we (0-7) 


“The supremum is taken over all distribution functions P, 
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LIMIT DISTRIBUTIONS 
In 1960 Prohorov strengthened this result, by showing that 


y(n) = O(n log?n). t soe (08) 
Our Theorem 1 states that” y(n) = O(n78). (0.9) 


SË The problem of estimating the function y(n) from below naturally arose 
Tehorov's students I.P. Tsaragradsky, Prohorov himself and L. D. Meshalkin spe 
pied themselves with such estimates. The latest result of Meshalkin (1961) runs 


thus : 
Vin) > can log n). «+ (0.10) 


2. For the case of sums 
Ej = Eat... Seng 


of random variables, independent within each series, with different distributions, 
Khintchine (1957), determined sufficient conditions so that the distribution of E, 
may converge weakly to an infinitely divisible distribution. The conditions are that 


there exist 
e 70, 70 


such that the distributions Fi, of &; satisfy 
E(x—ly) £r K Fee) < E+) t Er 
‘uniform’ character to this result of 


Our Theorem 2 is an attempt to give an 
can be made clearer with the 


Khintchine. The nature of the content of Theorem 2 


help of “Levy distance,” 
pikë”, EF") = inf € 


over all ¢ satisfying the condition 
F'(w—en)—€ < F"(x) < F(a+eh)+e. 


It is easy to see that from Theorem 2 we have the following corollary. 


Corollary: Jf 
sup (Fri E) <7 
pH, v) < ca”. 


the most useful means of proving limit theorems for distri- 
f independent variables is the apparatus of charac- 
abilistie method now in this domain can only rarely 


compete with the potentialities of the analytical apparatus of characteristic functions. 
Ou rere 1 and 2 are unusual examples of the other situation. The essential 
të vi 2 a 
Lemma 1, rel 


i 3 is ating to the ‘concentration function’ introduced 
element of their proofs is 
I came to know that F. M. Kagan, 


then 


3. As is known, 

butions of sums of a large number 0 
E sie ee 

teristic functions. The ‘direct prob 


in 1961, discovered the result 


ing this work 
*A fter completing y(n) = o(n-1/8 log n) 
Later this result was reported by F. M. Kagan in the conference on 
‘ergan (September 1962). 


,8) and (0.9). waa 2 
pë cal Statistics held in F 


which is between ( i 
Probability Theory and Mathemati 
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by Levy. Theorems of Levy and Doeblin about the properties of concentration tiai- 
tion were strengthened by me (cf. Kolmogorov, 1958) specially for proving the earlier 
versions of Theorems 1 and 2 given in Kolmogorov, (1956). The later development 
in the estimation of concentration functions belongs to Rogozin (196la; and 1961b). 
Rogozin too uses elementary direct probabilistic and set theoretic methods (Theorem 
5.5 on subsets of a finite set). 


The mathematicians, whose attention to this problem I succeeded in drawing, 
could not prove theorems of type 1 and 2 without appealing to the just indicated 
peculiar methods. 

Throughout this paper, as in Kolmogorov (1956), the methods of reasoning are 
essentially those introduced by Doeblin (1959). As is seen from what is said above, 
the transition from the degree 1/5 to the degree 1/3 in Theorem 1 was done by Prohorov. 
The removal of the factor log?n in the estimate of Prohorov (0.8) required (a) the use 
of a more precise estimate of the concentration function obtained by Rogozin and (b) 
some changes in Prohoroy’s proof, reflected in the introduction of Lemmas 5 and 6.* 

In the proof of Theorem 2 the transition from the degree 1/5 to the degree 
1/3 is effected by the techniques borrowed from the work of Prohorov (1960) combined 
with the use of Lemmas 5 and 6. 


4. Besides the distance 
pë”, F") = sup | F'(x)—F" (x)| 


E i ja E el le 


it is natural to consider “the variation distance” 


prë”, F") = 1 var (F(e)—F”(e)) = sup [F"(A)—F"(4)], 


—— CO 


where A is an arbitrary Borel measurable set on the real line. 


As is known, P(E’, F") K pylr”, F"). 

Hence for the function Vrp(n) = sup pp(F”, v) 
r 

we have the inequality Yy(n) > yin). 


However, nothing is known about the asymptotic behaviour of p(n). It 
is not even known whether Vrp(n)—0 as n— co. 
1. EIGHT LEMMAS 


Following Levy, for any distribution function F(x) we introduce the ‘concen- 
tration function’ 


Qeli) = sup [F(x+1+0)—F(x)] 
Lemma 1: If , fork = Bs ves i L>1>0 
PE < a} = Pte, > a} =} 


ie QA) $e, Ee, 


* The first step was done by F. 
on page 161). 


M. Kagan sometime before my work was written (see footnote 
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Lemma 2: Ifo—0, 150, n— qe PR then kë 
: Pe > or any dist 
function F(x) : for any distribution 


\ 


FaGae—l)—n < Fe) < FG (1-1). 
Lemma 3: If o> 0,0,> 0, then 


2 
ae 


ci 


16, .()—G, (01 < ¢ 


Lemma 4: If feF(da) —0, fatH(de) = o, h>0>0 


then $ sup | F(@)—E(x) | < cg. 


T=— © NS rs (r+l)h 
Lemma 5: IfM%=0, [&|<l, DE=o%, h>o>0 


then È sp [H(e)—G3(2)| < & Yo. 


remo the ES (11) 


Lemma 6: If ME,=0, |&|<l DE=o%, of =o% +05 


| H*G,,(2)—G,,(%)| K Go Loy. 


then 
Lemma 7: For any natural number n ad0<p<l 
oe res 
E aver SË a oy | < onp. 
Lemma 8: Let* 0<m <1 
1—p, ifm=0 
pm) = 4 Pr if mel 
0 ifm>1 
PE 9—Pi 
gl) = “at © $ 
pū) = TI pot) gñ) = Igma). 
2 
S| pm) —aim)| < eB Pe 
Then m 


me (m1, 69 mn) is an n-dimensional vector and X 
lows m = (Mi, a 


n-nogative integral components, 
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Lemmas 2, 3, 5 are proved by simple calculations. Lemma 1 is an immediate 
corollary of Theorem 1 of Rogozin ( 196la). Lemma 7 is proved in the work of Prohorov 
(1953). Lemma 4 follows from the estimate (č has distribution F) 


PEGI < PUEIZ |x|} < oja 
(Chebyshev’s inequality). 


Lemmas 5 and 6 reduce to the known estimate 


|H—Ge, 


<S cialo ki (L1) 

which follows from Lyapunov’s theorem in the formulation of Esseen (Gnedenko and 
Kolmogorov 1949, p. 216) 

c 1.2 

|H—Go,| < CE MG’. on OE) 


Lemma 6 can be obtained from (1.2) if the adjoined normal summand with 
dispersion o3 is represented as a sum of a lar 


ge number of summands with sufficiently 
small dispersions. 


The proof of Lemma 8 is somewhat more complicated. 


2. PROOF OF THEOREM 1 


1. Hereafter we shall assume n > 1, 
is immaterial, 


2. 


It is easy to see that this restriction 


Further we shall assume that E, are non-decreasing functions 
Ee = Pë (n) 
of mutually independent variables ne With uniform distribution on the interval [0, 1]. 


It is easy to prove that in an extended probability field (a, m, P) such quantities 
Nk exist, 


3. Suppose 
p= ni 


fo if p/2 < t < 1—p/2 
y= ; 
LL. otherwise, 
a = Mel, = 0) 
o? — D(Ex| Hg = 0) 
KOS PEs Kel 0); Ble) = Pë, < el — 3. 
Under the transition from the quantities £, to 


Er E Ér —a 
all these constructions remain uncha: 


5 nged. Only instead of a we have a’ = 0 and the 
functions A(x) and Bi 


x) become 
A'(z) = A(w+a); B(x) = Bieta), 
164. 


LIMIT DISTRIBUTIONS 
Thus it is enough to consider the case 
a=0 
and to this we shall restrict ourselves hereafter. 
4. In the decomposition F(x) = pB(x i 
x) = pB(x)+(1—p)A(x) the distributi i 
concentrated in the interval Ja”, x+], where PË 
æ- = F-(p/2), at = F-(1—p/2). 

The length of the interval is 

À = at—a-. 


The distribution B is outside the interval fu”, w+] and each of the rays (—co, 2-] 
and fat, co) has probability } under B. To the distributions B” (here and elsewhere 
power is in the sense of convolution), it is possible to apply Lemma 1, which yields the 
estimate 
Qgmlà) < cs mè. (241) 
5. We shall approximate the distribution 
H =[pB+(1—p)A}" = Z chip” (1—p) "Bed" 
m 


by an infinitely divisible distribution D in two different cases 
(A) à> vno, ; B) A< vyno. 


Case (A): Suppose 
m 
D= epin (B—B))- 2 “enn, 


m 


H, = 5 dc p™(1—p)" "B". 
m 


According to Lemma 7 cat (22) 


|D—H,| < nP = t’ ù 


- By Lemma 4 and estimate (2.1) when 4 = A we have” 


| B”x A" m—B"] E J | A""(e—2) —E(w—2) | B” (d) 


my)— =, 2.3) 
< Qpel) B a SUP | A™-"(y) Hy) | < ests ™ ( 
Thus |H-Hy| <E oep” —p)" BU An —B™| 
<L C50 nd 8127”, (2.4) 
1 5 ai "= Piu < 3098). 
where =z pasë n A 
Observing that Mp = np = në i i TËRA 
Dp = npl —p) KM 


s inequality 
y <P lA 


An-m is equal to (” 


wo obtain by Chebyshev’ në) < An 


—m)o? so that the conditions of Lemma 4 


j “The dispersion of the distribution 
> vno. 


a: tisfied when / => p 
ro satis 165 
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which, together with (2.4). leads to the inequality 
|H—H,| < (2c; ¢3+8)n-V. zs (20) 
From (2.6) and (2.2) we get (0.1). 
Case (B): Suppose 
D = explnp(B—E)):G 


n(l—n)et 


=g ,- 
Sh m! r ipjo 


A, =È dp” (1—p) BA 
m 


(n— m)at 


H, = E cap” — p) "BM eG, p 


By Lemma 7 we have 


|D—H;| K cyn”. za B8) 


The difference H—H,, is estimated with the help of Lemma 5, where we assume now 


h = Vno 
| B”xA”-”— Br 


(n—m)o? | 


< J LArMa—z)G (w—z) | B”(dz) 


(n— in)o2 


< n D S —M = E 
KO pnlvV 7) Kë as ESE (ri) no I A” (y) Gn —m)o2 (y) | 
Ko na mc A = Cem. 

A VMNT 


This estimate is completely analogous to the estimate (2.3) of case (A). Exactly in the 
same manner as in case (A), we get 


| H—H,| < egon 82” K cn, we (2.9) 
The difference H,—H, is estimated with the help of Lemma 3 : 


|H,—H,| < cygn49.5" 


where D= pH cp” “1 — pyet, 


It is possible to obtain from (2.5) and (2.6), 


h a is a 
with the help of Ghebyshev”s inequality E acai 


the estimate 
2" K cym 
which leads to jë 


(Ceca) 8, v (2.10) 
From (2.8), ( 


2 9 
9), (2.10) we obtain (0.1) which completes the proof of Theorem 1. 
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3. PROOF OF THEOREM 2 
1. Without loss of generality, we may consider € < 1. 
2. We shall show that it is enough to consider the case of continuous and 


strictly increasing functions F;,(7). 
Suppose that Theorem 2 has been proved with constant c, for the case of 
increasing functions. We consider the sum 


B= Ets. be 
with arbitrary F(x) satisfying (0.2). Let L> 21. 


continuous and strictly 


We choose J’ and L’ such that 


' 


, i ER 
bs > BP > 2, P7IL (3.1) 


By Lemma 2 it is possible to choose such a small qy that, for any distribution function 
F(x) the inequalities 
PG jo(u—A)—€ < F(x) < F* < Goz (w-+A)-+€ ve (3.2) 
0 
Fig, ça < Fle) < FPG, ata) (3.3) 
where Vato, AS 


6’ = Cg max ja (log a (22"9)] 


l ës * 
are fulfilled. Let Fy, = Fo: 


By (0.2) and (3.2) we get 

B(a—l’)—2e < Fue) < Elu-+l')+2e. 
are continuous and strictly increasing, there exists an infinitely 
for which 
pys < He) < Di). ne (8.4) 


Since the function Fg 
divisible distribution D 
D'(e— 

H' = H* po? 

Observing that 0 
= A(x) and (3.4), we obtain 


< Hx) < Di(w-+-L)+20" (3.4a) 


from (3.3) applied to F(x) 
D'(e—D)—2ë 


gë” K 205 MAX [z (8 ie (2e)"°] 


< Cy max [+(e a an). 


follows from 
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since, by (3.1) 


(3.48.) 


,. (0.8 
when c, is chosen as above, (0-3) 
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3. In accordance with 2 we shall hereafter consider the case when F, (u) 
are continuous and strictly increasing. Then the function 


Ay (p) = Fp p2)— FE (P)2) 
will be well-defined for all p, 0 < p < 1, continuous and strictly decreasing. It takes 
all positive values. Therefore the inverse function p, (A) is defined on 0 KA KO 
as a continuous strictly decreasing function taking all values in the interval 1 >p > 0. 


The function 


sA) = E pel) 


is also continuous and strictly decreasing. It takes all values n>s> 0. Thus, 
when 0 < £ < 1, there exists a unique solution A, for the equation s(A) = £7?®. 


l. Let 
Ay if Ay Sl 
A= 
l if à <l 
Pr = DKA), 8 = KA) = z Pr 
ty = FD (Pre), të = Fyt (1— Pr) 
0 ta Ce Ka 
VES 
1 otherwise 
a, = MÇErlit, = 0), 
of = DEJ = 0), 
o? = E (1—p,) o2. 
k 
Putting 


Aya) = PE, < vl =O}: Bye) = Pér < vl = 1}, 
we represent F(x) in the form 


Flv) = Pu Be) +1 — pr) Arle) 
where tho distribution A, is concentrated in the interval leg, af] and By outside this 
interval such that the rays (—co sọ] and [x}, co) have probability $ each. 


Using the notations of Lemma 8 and putting* 
Bim) = TË Be, A(m) = NAI”: 
k E KË 


we obtain H = Tp,ByH(1—py) Ag] = E plm)Bn)* AC) 


* Bim 


is defined for arbi fu . pë 
value 0 or m) or arbitrary non negative mi: but A(m) only in the case when mps take the 
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The construction of the a i ing i i ivisi 
approximating infinitely divisible distributi i i 
Pg ee ee y divisible distribution will be different 


> 
AZI |d <0 
A= |A=A JAzi 
s = 8/3 |s — 28 |s —e25 
5. Since A >l always 
Py = BAA) < PO < E. 


onditions (0.2) of the theorem in our proof. Since 
I constructions are invariant with respect 


(3.5) 


It is only here that we appeal to ¢ 
the definition of p,(A) and all other essentia 
to shifts 

Ee = Ep— Ce 
we can restrict ourselves to the case 

ay = 0. 


6. We shall make some more simplifications which we need in future. 
Let c=Suh, wh) = My 

E : 
ris equal to the number of times čr lies in the interval të < čp Kaf. It is easy to 


verify that 
Mr=s 
Dr= = pi—Pr) < $- 


qualities 


Thus, by Chebyshev’s ine 
pm) Z Ba (3.6) 


= x 
P{|7—s| > 4 uam- 2¢ cë 


7. Let further 
ë-X (1—/x) Ér 
k 
hich lie in the interval ty < Čr < at. Because of [the 


Tt is tho sum of all those br w 
— 0 for all m 


assumption 4£ E nit wane ae 
Furth re interested in the conditional dispersions 
urther we a 
i), 


oñ) = DJE = ™ 
p= o(p) 


m variable 
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it is easy to verify that 


Mp? = 0”. 
Since P A 
X(1—p,)oz = 0° 
A2 
21 
TËS z 
Pr = PAA) < pill) < E 
we have De < o”N”e 
2A2£ 
Thus P(|p?—o?| > c) = z pm) < E 2 vës (8.7) 
lo?(m) -02| Ke 8 
8. Finally, ve observe that by Lemma 1 
T çe 
Oret) < cs KM). so (88) 


We immediately pass on to the proof of Theorem 2 in cases (A), (B) and (C). 
Case (A): In this case 
H= i [74,.B,+(1—p,)A,] = E qm) Bim) x A(m) 
is approximated by D = exp z PilBj,— E) = £ q(m) Bim). 
In order to pass on to D from H we further consider e 
H, = È pËr) Bm), 


By Lemma 8 and (3.5) 
|D—H,| <= |p(m)—a)| < c È vi 
< 


m 
CE x Pr K Ca E. ne (3.9) 


On the other hand, by Lemma 4 when h = A = Ay > 7, and (3.8) 


| BG)*A(m)— Bim) | < J | A(m)(e—z)— E(x—z) | B(m)(dz) 
< Qaim (Ao) X sup | A(m)(y)—E(y) | 
T rho KY K (7T+1) do 
K egos). ... (3-10) 
ae |H—-H,| < X p(m)| Bm)*A(m)—B(m)| 
S V2 og cge-V3-. 25! s (8.11) 
where ye z più. 


tm) < qe” 


Observing that in our case s = £°/8, we have by (3.6) 


E’ < 8-2/3 = 4g1/3 
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i.e., from (3.11) it follows that 


|H—H,| < (vV2eses--S)e18. ves (G) 
From (3.9) and (3.12) we obtain 


|H—D| < cë”. se (B03) 
Case (B): In this case we put 


D = exp È p,(B,—E)*Ge = E dl) Bl)” Goe, 


m 


Hy = E phm) B)* Ge (m), 


m 


H, = 5 pit) Bim)*Gor. 


m 
The inequality | D—H,| < gets we (3.14) 
is obtained exactly in the same way as (3.9) was in case (A). 
By Lemma 5 when h =o >A=A, and (3.8) we obtain 
| Bim)*A(m) —_Blm)*Go2(™) | 
< S| A(a)(w—2) —Goxymm) (2) | Bien (dz) 


<Q (0) E sup | Alm) —Geam) | ae (3015) 


r rosy (rl) 


K 6, C [AT] . Co Agjo = CsColt(m) J>. 
Exactly in the same way as (3.12) was obtained from (3.10) in case (A), we obtain 
üxactly in , 
from (3.15) |H—H,| Š (0509 -F8)E18. ae (3.16) 


3 
It remains to estimate the difference H,—Hy, By Lemma 


= 1 
| Qo, — Qoam) I < Cyg& 


Co, 


m) il< 
Cy 


SS 
g? | 


tp of (3.7) and taking into account that now A < 7 we obtain the 
With the help o V- 


estimate 


pë sae LETI 
za. (3.17) 
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Thus, analogous to the deduction of (2.10) in the proof of Theorem 1, we obtain 


IEJ—H < (¢19+-¢29) 8%. 
From (3.14), (3.16) and (3.17) it follows that 
|H—D]| < cg, eV. ee (3,18) 


The inequalities (3.12) and (3.18) show that in cases (A) and (B) the estimate (0.3), 
constituting the content of Theorem 2 can be changed to the stronger one : 


- D(æ)—ca£"3 < H(x) < D(x)+ cz". “e (8:19) 


Case (C): In this case, getting an estimate of the type (3.19) does not work. 
We suppose that ` 


1 Ly-t 
= —— Li log — 3.20 
va Ze 2) (3.20) 
and introduce an auxiliary distribution 
E = H*G.,. 
By Lemma 2, if q= qe Të — yr, vs (8.21) 
then H'(e@—L)—9 < He) < H'(w+L)+. we (3.22) 
Further we shall show that the infinitely divisible law 
D = Zim) Binay o} = otto? 
satisfies the inequalit; D-H'|\< jt uyi 
quality | ETA I a4 (log 7) |. ve (3.28) 


Together with (3.21) and (3.22), (3.23) yields (0.3). 


It i s 
remains to prove (3.23). To this end we introduce the distributions 


H = Xp (m) BTG m Am) = 0(m)+02 
H! = — 
2 pa pm) Bn) *O g. 
The proof of the inequality 
|D—H3\ < c,,813 
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is exactly the same as the proof of (3.9) and (3.14) in cases (A) and (B). But now 


s=} pers 
E 


which leads to the estimate 


Ipi < SË py < ev 
d k 


By Lemma 3 | Hi—H3| K egjeV2--X” pt) ne (3.25) 


where X” is taken over those p(m) for which 


Am) 


Cog 
oF = ees we (3.26) 


Cy 


With the above choice of ca, it follows from (3.25) that 
| o3(m)—o} | = |o%(m)—o*) | > o}, 
Thus by (3.7) and the inequalities the second inequality is obtained from (3.20). 
o<oA=I< Ko 


we have a” < Tegi Sa EU 
Thus from (3.25) we obtain 
| Hj—Hs| < (Coy-+1)e. ws (3.27) 
Finally, by Lemma 6 and (3.20) we obtain 
a ae 
[Amao —Cozm)l K Go Yoo S V2 Cro (+ (ter) J 
+ 
1 , > l L 
H—HIS Vace lig- 
Ne a V2 2 (1087) ni (3.28) 


From (3.24), (3.27) and (3.28), (3.23) follows immediately. This completes the proof 


of Theorem 3. 
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_ for the solution of cert: 


APPLICATIONS OF CHARACTERISTIC FUNCTIONS 
IN STATISTICS* 


By EUGENE LUKACS 
The Catholic University of America 


SUMMARY. Applications of characteristic functions, such as the distribution problem of 


statistics and various characterization and regression problems are discussed. In section 6 the bino 


mial and the negative binomial population are characterized by a regression property. 


l. INTRODUCTION 


c functions and generating functions were originally developed 
ain problems in probability theory, in particular for the study 
it distributions of sums of independent random variables. In the present 
paper we wish to show that there exists also a large variety of problems in mathe- 


matical statistics which can be approached by the method of characteristic functions 
e theory of characteristic functions and their applications 


Characteristi 


of the lim 


A certain familiarity with th 
is therefore a useful asset for the mathematical statistician. 


We do not intend to give here a comprehensive survey of the applications 


of characteristic functions in mathematical statistics. Our aim is more modest 
të 


we wish to indicate only a few typical results in this area. The present paper is 
but we will occasionally mention open problems and will also discuss 


largely expository, 
ation of two discrete distributions which is probably new. 


in Section 6, a characteriz 
sumerous branches of science one is interested in quantities which are 
Series of independent measurements are performed 
under identical conditions in order to obtain data which should permit conclusions 
concerning the phenomenon under investigation. The result of the measurements 
can be treated as independently and identically distributed random variables, say 
Bors Nig co jë In such a situation we say that the observations Xy Xa ..., Xn 
form a sample of size n drawn from a certain population. We will call the common 
distribution function of these random variables the population distribution function. 


The statistician introduces functions of the observations and studies their properties 
at conclusions concerning the population. These functions cannot 
are also random variables. This consi- 


alued and measurable functions of 
The statistical problems 
be formulated in terms of 


In r 
subject to random fluctuations. 


in order to arrive 
be quite arbitrary, they are only useful if they 
necessitates the restriction to single v 
These functions are called statistics. 
s in this paper will usually 


deration 
the observations. 
which we intend to discus: 


certain statistics. 


“This paper was prepared with support from the National Science Foundation grant NSF-GP-96, 
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A few remarks concerning our notation follow : 


We will use capital letters taken from the end of the alphabet for random 
variables; distribution functions will be denoted by F(x), G(x), ete., while f(t), g(t), 
ete. are the notations for the corresponding characteristic functions. Random vari- 


ables, distribution functions and characteristic functions may be written with indices 
which will indicate their connection. 


2. THE DISTRIBUTION PROBLEM OF STATISTICS 


Let X,, Xa, ..., X, be a sample of size n drawn from a population with population 
distribution function F(x) and denote a statistic by Se SG, Xs, vee X,). The 
statistic S is a random variable; we write Fs(x) for its distribution function and 


so 
fl = &(e"S) = | dEle) 
-%0 
for its characteristic function. The problem of determining the distribution function 
of a statistic for a given population distribution function is called the distribution 
problem of statistics. 
An important result from the theory of characteristic functions permits to 
determine the characteristic function of a statistic S by means of the relation! 
0 
fst) = f tee f exp [S(t ta, ..., C JAF (EAF (£3) ... dF (x,). axe (1) 
=~ >o% 
The corresponding distribution function Fs(x) is obtained by means of the inversion 
formula 
tf l—eith itz 29 
F(v-+h)—F (x) = jim z J a eft) see (2.2) 
provided that Ja, +h] is a continuity interval of F(x). TË fs(t) is absolutely 
integrable over (—o0, +0) then 


co 


Fee) = È| efyd e (2.20) 
27 pa 

yields the frequency function of S. It follows from (2.1) and (2.2) that the distribution 
F's(x) of the statistic S is uniquely determined by the population distribution function 
F(x). Thus we have obtained in principle a solution of the distribution problem of 
statistics.2 However, the practical usefulness of this method depends on the possi- 

bility to evaluate the integrals (2.1) and (2.2) [respectively (2.2a)]. 
We mention next a few ex: 


amples in which the distribution problem can be 
treated in this manner. 


1 See Laha and Lukacs (1963), Theorem 1.5.8, 
2 The method of characteristic 


8 functions was systematically applied for the solution of 
distribution problems by Kullback (1934) and (1936), 
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Let X,, Xg,..., X,, be a sample from a certain population with characteristic 
function f(t) and write X = 2 (X;/n) for the sample mean. The well-known con- 
j=1 


volution theorem permits the derivation of an explicit formula for the characteristic 


function f(t) of X. One obtains 
t n 
so KËT 


The distribution function F g(x) is uniquely determined by (2.3), the inversion formula 


(2.3) 


yields 
E 1 4 1—e itz t n 3 
Pyet n)—Fgte)— lim 5 IS e[s )] a m (24) 
provided that [a, v--hjis a continuity interval of F(x). This formula is of practical 
importance whenever it is possible to evaluate the integral in (2.4). If the population 
distribution is absolutely continuous, then this is also true for F(x); if in addition 
f(t) is absolutely integrable then one can determine the probability density of X by 
means of the Fourier inversion formula (2.2a) and obtains 


rje eo) st) 


* formula in case the characteristic function f(t) of an 


We use a different inversion arë j 
function is not absolutely integrable, 


absolutely continuous population distribution 

we obtain then m 

1 i ltl s 

icy SË Le Të (i- ) fo dt. . (2.4D) 
ra= Bae LT UT 

formula (2.3)—even without using an inversion formula 


—interesting results concerning the distribution of the sample mean in several popu- 
lations. We see immediately that the sample mean of a sample drawn from a normal 
population N(x, 0”) has also a normal distribution with mean x and variance g?/n. 
Similarly, in a Gamma population with parameters 0 and À, the sample mean has also 
a Gamma distribution with parameters no any nà. On the aihen hand one can see 
that the Cauchy population has the interesting property that the aebeut function 

i ith the population distribution function. Other 


ne sam ple mean 18 identical W1 
4 n be studied in this manner 
p 8 T avi j T ean can | 


1 t lar an el function populations [see Laha and Lukacs 
are the rectangular t 


(1963)]. The frequency fu 
and type II populations was $ 


ic mean of a $ : 

DË ae E pe of a sample drawn from a rectangular population was studied 

e ere and Morelock (1959). The distribution problem for the geo- 
y Schulz-Arens 


trio mean of a Gamma population was treated by Kullback (1934). 
metric mea a Ga 


Goldberg (1961), 


We obtain already from 


d the Bess 
inction of the sample mean in Pearson type I, type VII 


tudied by Irwin (1927, 1929, 1930). The distribution 
ample can be investigated in a similar manner, 


Theorem 6C. We note that (2.4b) is valid also when IO 


* See for instance 
is not absolutely integrable. 
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Let X,, Xa... Xn be a sample from a certain population and let 
Q = Q(X), Xg,...., X,) be a statistic. It is then interesting to find conditions which 
assure that the distribution function of Q determines the population distribution 
function. For instance, if Q is a linear statistie which is normally distributed, 
then the population distribution function is necessarily also normal. Linnik (1956) 
considered a rather general statistic Q (subject to certain restrictions) and assumed 
also that the population distribution function satisfies certain conditions. He derived 
several theorems which deal with situations in which the knowledge of the distribution 


function of Q assures that thə population distribution function belongs to a certain 
family of distributions. 


The method of characteristic functions is also very useful in the study of 
distribution problems of normal populations. The distributions of many important 
statistics can be derived by using (2.1) and (2.2). We mention here only as examples 
the chi-square and the non-central chi-square distributions. Characteristic functions 
can also be used to derive necessary and sufficient conditions which assure that a qua- 
dratic polynomial in independently distributed normal variables have a chi-square 
(respectively a non-central chi-square) distribution. It is also possible to obtain by 
the method of characteristic functions the distribution of the quotient XJY of two 
independent random variables X and Y whose distributions are known. Considering 
the quotient of two normally distributed random variables we obtain the Cauchy 
distribution. The distribution of the quotient of two random variables, each of which 
has a Gamma distribution can be studied in a similar manner; the F-distribution and 
Student’s distribution are obtained as particular cases. The converse problem of deter- 
mining the families of random variables whose quotient follows these laws was studied 
by several authors, Mauldon (1956), Steck (1958), Kotlarski (1960) gave examples 
of random variables whose quotient follows the Cauchy law. Laha (19592) and (1959b) 


obtained some general theorems in this area and started also a study* of similar 
problems related to the F-distribution. 


Distribution problems in multivariate populations can also be treated by the 


method of characteristic functions. We mention here only one example : 


Theorem 2.1: Let X — (X,,..., X,) be a random vector with non-singular 
P-variate normal distribution with mean vector (0, 0, ..., 0) and variance-covariance matrix 


A and let M be a symmetric px p matrix and Q = XMX’ be a quadratic statistic. The 
characteristic function of Q is then 


fat) = & (cite) = € (ctXMX’) = |I—2itAQ |? 


wh i ; e 
tere Lisa pxp matrix whose diagonal elements are 1 while all other elements are zero. 


It follows then that XIX’ 


3 has a chi-square distribution with 7 degrees of 
freedom if, and only if, AM has ran 2 Š P 


kr and has exactly r eigenvalues equal to 1. 


* Not yet published, 
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3. CHARACTERIZATION PROBLEMS—LINEAR STATISTICS 

We consider n independently, but not necessarily identically, distributed 
random variables X,, Xo, ..., Xp with distribution functions F(x), F,(x),..., Fẹ (x). 
If the random variables X,, Xo, ..., X, are identically distributed with common distri- 
bution function F(x), we can consider X4, Xp, ..., X as a sample of size n, taken from 
a population with population distribution function F(z). We introduce measurable 
functions of the random variables and call them statistics (even if the X,, Xo, ..., Xn 
are not identically distributed). In this section we discuss the problem of determining 
the distribution functions F(x), F,(«),.-.,F,(%) [respectively F(x)] from suitable assump- 
tions concerning the distribution of two linear statistics, Ly = a,X,+a,X,++--+a,X,, 
and Dy = bX +b Xat +H bnn The following two assumptions concerning L, 
and L, were considered extensively : 

(a) ZL and L, are independently distrihuted, 

(b) L and L, are identically distributed. 

(a) The problem of two independently distributed linear statistics has a 
long history; it was solved in full generality by Darmois (1953) and Skitovich (1953, 
and 1954). These authors formulated the problem in terms of characteristic func- 
tions and used the method of finite differences, a technique which was suggested 
independently by Darmois (1951) and Gnedenko (1948). We state here only the result 
and refer the reader for a proof either to the papers by Darmois (1953) and Skitovich 
(1954) mentioned above or to Laha and Lukacs (1963). : 

Theorem 3.1 (Skitovich) : Let X, Xo. Xn be n independently but not 
distributed random variables. Suppose that the two linear forms 
X, and i= ,X,+b.X_+++6,X, are independently dis- 
ble which has non-zero coefficient in both forms is normally 


necessarily identically 
Li = ajXj-kasX qo Han : 
tributed. Then each random varia: 
distributed. 

(b) The pr 
many random varia 
(1939). A number 0 
We state here only 
the following terminology. 
ba Xati +b,X,, we introduce 
ae ap agë tt etal = Ia) ldel —-— [Ba 


G(z) = j&i) 
xing function. We are now in a position to formulate Linnik’s 
ao 


oblem of identically distributed linear statistics (even in infinitely 
ples) was first studied in an important paper by Marcinkiewicz 
f very interesting general results were obtained by Linnik (1953). 
one of his principal results. To formulate it we introduce 
Let Iy = a1XjtaXyt ta, X, and Ly = b,X,+ 
the entire function G(z) of the complex variable z by 


and call it the determi 
result : E 
Theorem 3.2 (Linnik) : Let X Nas ...,X,, be n independently and identically 
distrib a random variables with common distribution function F(x). Suppose that 
“gë nX +X detajin and La = dXj Pd Xot +b, X,, are two linear statistics 
1 Aya Tae 
tË 
such tha wee ai RE dh 
1<j<n 1Kjgn 
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For the equivalence of the two statements 
(I) F(x) is a normal distribution 
(II) L, and L are identically distributed 
it is necessary and sufficient that the following five conditions be satisfied, 
(i) atag ea, = bj date, 
(ii) G(2) = 0, 
(iii) all zeros of G(z) which are integers and are divisible by 4 are simple roots, 
(iv) all positive roots of G(z) which are even integers of the form 4n+-2 have a 
multiplicity not exceeding 2. I f there exists such a double root, then it is unique and 
is the greatest positive root of G(z), 
(v) the determining function G(z) can have at most one odd integer positive real 


root y. If such a root exists then it is simple and BA is odd. 
Theorems 3.1 and 3.2 yield characterizations of the normal distribution. 


4. CHARACTERIZATION PROBLEMS—POLYNOMIAL STATISTICS 
Let X,, Xan.. Xy be a sample from a normal population and write 
Si n n ÇË . e 
= z (Xi n) for the sample mean and st = 5 (X;—X)?/n for the sample variance. Using 
am jot 


the methods discussed in Section 2, one can derive the joint distribution of X and s? 
and see that X and s? are independently distributed. This fact is of great importance 
in the derivation of Student’s test and was first noted by Fisher (1925). The converse 
of R. A. Fisher’s result is also true and led to the first characterization of a population 
by the independence of two statistics which we formulate next. 


Theorem 4.1: Let X, Xa ..., AX, be a sample from a certain population and 


denote the sample mean by X and the sample variance by st. The statistics X and st 
are independently distributed if, and only if, the population is normal. 


Theorem 4.1 was first proven under certain unnecessary restrictions which 


were gradually removed by several authors. For its history we refer the reader to 
Laha and Lukacs (1963) or Lukacs (1956) 


Theorem 4.1 was generalized in two directions : Firstly (Laha, 1956) by 
considering the independence of the mean and a homogeneous quadratic statistic; 
secondly by considering polynomials of degree exceeding two. Let ky be Fisher’s 
k-statistics* of order p; it can be shown that the normal populationi s characterized 
by the independence of k, and Z. 

We mention next a similar result. 


" fe We denote the sample central moment of 
order p by m, = ZI )?/n and suppose that (p—1)! is not divisible by (n—1). 
The ion i i 5 - , 

population is normal if, and only if, m, and X are independently distributed. 


* The k-statistic kp of 


Se order 7 5 
statistic of order P such tha TABS by Hite 


ger) is a symmetric and homogeneous polynomial 


t Por š 
a Elkp) = Kp where kp is the p-th cumulant of the population dis- 
ribution function, It i i RË 
pë io is easily seen that ki = X and kz =n 8?|(n-1). Tho k-statistics of ord 
an 2 are not proportional to central moments, j = di 
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The proof of this assertion, given in Laha, Lukacs and Newman (1960), requires 
more powerful tools than the proof of Theorem 4.1. Moreover, it is very likely that the 
assumption concerning p and n (which is always satisfied if n is sufficiently large) is 
superfluous and is necessitated only by the technique used in the proof. It would be 
of interest to modify the proof so as to avoid this restriction. 

The first investigations of characterization problems dealt with specific statis- 
tics, such as the sample mean and the sample variance, and the hypothesis of their 
independence.* ‘This assumption was used to determine the population distribution 
function, except for the numerical value of some parameters. The solution of these 
problems was usually carried out by deriving a differential equation which the charac- 
teristic function must satisfy and by determining those solutions of this differential 
equation which are characteristic functions. The decision whether a solution of the 
differential equation is a characteristic function is frequently the most difficult part of 
this procedure. It appeared therefore desirable to find general properties of charac- 
teristic functions which satisfy certain differential equations. This led then also to 
investigations concerning the analytic properties of characteristic functions which occur 


in characterization problems. i 
The following result, concerning differential equations is due to Zinger and 


Linnik (1956) and (1957). For its formulation we must introduce certain notations and 


definitions. 


Let i i 
S Ajaja P.P” = OT e (41) 
be an ordinary differential equation of order m. The Aj;...j, are real constants and 
the sum is taken over all non-negative integers jı ---, În Which satisfy the relation 


jitjat -+n < P- (4.2) 
uch that at least one of the coefficients Aj, ...j, with ji +jn = p 


is an integer s 8 i 3 
Here p is an integ We adjoin to the differential equation (4.1) the polynomial 


is different from zero. 
1 A ; . Jt Jn 
A(t s.es Xp) = ni ` > Ajrin Ej ve (4.3) 


here the first summation yt runs over all permutations (hy, ky) of the first n 
nee SHE nd summation is taken over all (Ji, ...,J) which satisfy 


positive integers while the seco 
(4.2). l E dë a 
The differential equation (4.1) is said to be positive definite if its adjoint poly- 
' ' 
nomial (4.3) is non-negative. në i 
acs m 4.2: Suppose that the function f(t) is, in a certain neighbourhood of 
a solatio itive definite differential equation (4.1) and assume that 


inë uti the post 
ve origin, a solution of ie j at 
mo Y i a characteristic function, then it is necessarily an entire 
mzn—l. 


function. 


If the solution ts 


PS ——— e i a 8 ; 3 
Fo) ionally weakor assumptions were used; these will be discussed in Sections 5 and 6, 
* Occasl 
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Theorem 4,2 is a very interesting result concerning the analytic properties of 
the solutions of certain ordinary differential equations. Moreover, differential 
equations of the form (4.1) occur in many characterization problems. This is not only 
the case if one tries to characterize populations by the independence of a polynomial 
statistic and the sample mean, but also by the weaker assumption (to be discussed 
in Section 5) that a polynomial statistic has constant regression on the sample mean. 
However, in applying Theorem 4.2 one is greatly handicapped by the severe restric- 
tions contained in its assumptions. As an example we mention the situation treated 
by Theorem 4.1. In this case, the assumption of Theorem 4.2 that m > n—1 restricts 
its application to samples of size n <3. Similar examples can be given and it seems 
therefore desirable to try to weaken the assumptions of Theorem 4.2. Moreover, it 
would be useful (for instance in connection with characterizations of the Gamma 
distribution) to obtain results which use weaker assumptions but state only that the 
solutions of a differential equation which are characteristic functions are regular in 
a horizontal strip or in a half-plane. 


Theorem 4.2 indicates the recent trend of the characterization problem; this 
trend can be described as an attempt to derive analytical properties of the character- 


istic functions rather than to obtain a complete determination of the population. We 
mention next a few results of this type. 


A polynomial* P(x, Xp, ..., Va) of degree m is said to be admissible if the coeffi- 
cients of the terms xy (j = 1, 2, ..., n) are not zero. 


Theorem 4.3: Let X, Ng. Ky be n independently (but not necessarily 
identically) distributed random variables. Let PUK Kayas Ry): ard Q(X, Kg; vesi Ka) 
be two admissible polynomial statistics. I f P and Q are independent then each ne 
(j = 1,2, ..., n) has finite moments of all orders. 

It is possible to obtain more information about the distribution functions of 
the random variables X j if one assumes that one of the polynomials is 


Theorem 4.4: Let Aj, Xo, ..., X, be n independently (but not necessarily 
identically) distributed random variables with characteristic functions filt), Jat), -- fat) 
respectively. Let P = PEK Aa 


a linear statistic. 


+; X,,) be an admissible polynomial statistic and let 
L= z a;X; (with aj + 0 for j = 1,2,...,n)bea linear form. If P and L are indepen- 
ja 


dent then the characteristic functions f(t) G= 1, 2, . 


++) n) are entire functions of finite 
order, 


Theorems 4.3 and 4.4 are due to Zinger (1958). 


Suppose that the condition 


s of Theorem 4.2 hold and that the characteristic 
functions fj 


(t) have no zeros in the entire complex plane. Then it is easy to show that 
the random variables X; are normally distributed. Naturally, it is not appropriate 
to impose such a condition on the characteristic functions. Instead it is desirable to 
find a condition on the polynomial P which assures that the f(t) have no zeros. For 
the case of identically distributed random variables, Linnik (1956) succeeded in finding 


* We assume that similar terms have been collected in every polynomial which we conaide 
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such a condition. This leads to characterizations of the normal population. We 
remark that the characterization by the independence of k, and X as well as the charac- 
terization by the independence of m, and X can be obtained from these general 
theorems. 

The proof of these results is rather complicated. We refer the reader either to 
the original papers by Zinger (1958), Linnik (1956) or to Laha and Lukacs (1963). 


5. REGRESSION PROBLEMS 


Tf one examines the proof of Theorem 4.1 then one notices that the assumption 
of the independence of the statistics s* and X is not fully used. This leads to the 
question whether the assumptions of the Theorem 4.1 could be modified so as to contain 
are not needed in the proof. This is possible and results in 


no hypotheses which 
lations by regression properties of certain statistics. We 


characterizations of popul 
give first a definition and state a lemma which is essential in these studies. 

Let X and Y be two random variables and assume that the expectationg (Y) 
We write &(Y | X) for the conditional expectation of Y, given the value of 


of Y exists. 
X. Clearly, &(¥|X) is a random variable. It is also called the regression of Y on 
x. 
We say that the random variable Y has constant regression on the random 
variable X if the relation. 
s (Od) 


E&Y |X) = &(Y) 
holds almost everywhere. 
Lemma 5.1: Let X and Y be two random variables and suppose that the expec- 


tation E(Y) exists. Y has constant regression on X if, and only if, the relation 
9 


(Ye) = &(Y)&(e"*) NS.) 


holds for all real t. 
Tf one multiplies (5.1) by 


mediately that (5.2) is a necessary conditi j 
Tn Section 6 we shall consider two random variables X and F such that 


(a) Y has constant regression on x 


(b) &(Y) = 0. 


etX and takes the expectation then one sees im- 
on and it is not difficult to prove its sufficiency. 


In this case we will say that Y has zero regression on X. According to Lemma 5.1, 
n this e ae E A K 
ti 1 ariable Y has zero regression on X if, and only if, the relation 
1e random variable 


e(Yeitt) = 0 holds for all real t. 
racterizations of populations by the assumption that 


We will now discuss char i 

al statistic has constant regression on the sample mean (or equivalently on 

ti L $ sh x deren) Let X,, Xa,» Ap ben independently and identically 
ne sum L = X4 +42 


listributed random variables with common distribution function F(x) [that is X,..,, X 
distri ni és om a population with population distribution function F(x)]. 
is a sa a 


a polynomi 


n 
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Let P SY Air. TË. xe 
be a polynomial statistic, the summation is here extended over all (jj, ...,j,) which 
satisfy the inequality 


yee Dy < P. 
Suppose that P has constant regression on L, it follows then from Lemma 5.1 that 


ElPe’) SY dt ja EX... XË di) = E(P)E(ei). we (5.3) 


Since (Xie) = i a S(t) = if and &(e*”) = IJË)”, where f(t) is the characteris- 


tic function of F(x), we conclude from (5.3) that 


> Aj, jp fF f™ agate +n) @(P) AP. ni (5.4) 


We note that (5.4) is a differential equation for f(t) wich has the form (4.1); the result 
of Theorem 4.2, as well as the unsolved problems mentioned in connection with it, are 
therefore also of interest if one wishes to characterize a population by the property 
that a polynomial statistic has constant regression on the sample mean. 


If we choose for P the sample variance st, then (5.4) becomes very simple and 
we obtain the following result. 


Theorem 5.1: Let X,, Xp, ..., X, be a sample from a population whose distri- 
bution function has finite variance o?. The population is normal if, and only if, the sample 
variance së has constant regression on the sample mean X. 


The assumption that a random variable has constant regression on a second 
variable is weaker than the assumption that they are independent. It is therefore 
not surprising that the property of constant regression permits to characterize popula- 
tions for which no characterization by the independence of two statistics is known. 
We mention here only the Gamma population and state without proof the following 
result concerning the Poisson population. 


Theorem 5.2: Let X,, Xg,..., X, be a sample from a population with distri- 
bution function F(x). Let p> and r> l be two integers and assume that 


G) F(x) has moments up to order p+r, 


(i) F(x) = 0 for « <0 while F(x) > 0 for s >o. 


The statistic kp.,—k, has constant regression on ky = X if, and only if, F(x) is a Poisson 
distribution. 


TË we omit assumption (ii), we can characterize a wider family of distributions; 
it can be shown that this family consists of a convolution of a Poisson distribution, 
the conjugate to a Poisson distribution and a normal distribution. Either of these 
three factors may be absent, assumption (ii) assures the absence of the normal and the 
conjugate Poissonian component. For details we refer the reader to Lukacs (1961). 


It would be interesting to obtain similar characterizations of finite convolut 


ions of 
Poisson type distributions. 
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In the next section we will discuss in detail the characterization of a population 
by constant (zero) regression. Before formulating this problem we wish to make a 
few remarks concerning other regression problems. The characterization problems 
treated in this section can be extended by considering polynomial regression. The 
resulting computations become very tedious, some results are known concerning qua- 
There are also many interesting investigations concerning linear 
A discussion of these topics would exceed the scope of this 
as references can be found in Laha and Lukacs (1963). 


dratic regression. 
stochastic structures. 
paper, some results as well 


6. CHARACTERIZATION BY ZERO REGRESSION 


In this section we discuss a characterization of the binomial and of the negative 


binomial population by the property that a certain polynomial statistic has zero 
regression on the sample mean. It is convenient to express this polynomial statistic 


in terms of the power sums 


=> x we (6.1) 


as 
T= 184+ (n—4)8g81-+(3—20) 82+ 8987 — nss H(n+1)sası— si. (6.2) 
This is a symmetric, inhomogeneous statistic of degree 4 which does not seem to admit 
a simple representation in terms of other, more familiar functions of the observations. 
We formulate now our result: á 

., X, be a sample from a population whose distri- 
a finite moment of the third order and denote 


and K, respectively. The characteristic func- 


Theorem 6.1: Let Xy X» -- 


bution function is non-degenerate and has 


the cumulants of order 1 and 2 of F(a) by Kı 
tion f(t) of F(x) has the form 
Ky Ki— Ko pit ere To 
E E E FSHESË e wee by 
fo Eee Ky 1 
if, and only if, the statistic T, given by (6.2), has zero regression on L = X,+X_+...+X,,. 
bution function of f(t) is a binomial distribution if 0 < K/K, < 1 


ve integer; it is a negative binomial distribution if 
ative binomial distribution, shifted by 


The distri 
and n = K3/(kKy—Ke) İS 
0 Z Kft, Kl and is t 


r = KE/(Kg—Ky) in case Kı < 0. 
T in terms of the augmented symmetric functions (see f.i. David 


a positi 
he conjugate to a neg 


We write 


and Kendall (1949)) and obt 
Pë PHa 11] të dë 
X? X—4(n—2) 3 X? NjA 28 X? X; X, 


+(n—2) £ X} Xj;—6E X; Xj X; . (6.4a) 


ain 


dë T = (n—2)% 


mation goes over all subscripts t, j, k, which are different. 


where the sumI 
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It follows from Lemma 5.1 that the relation 
E(T eth) = 0 we (6.5) 
holds for all real £. dit 
Since k EX E ci) —i 


we obtain easily from (6.5) the differential equation 
ov sree + Oly ary =o 
Since f(t) is continuous and f(0) = 1, there exists a A > 0 such that f(t) 4 0 for |t| < A. 


We restrict in the following £ to this neighbourhood of t = 0 and can therefore divide 
the preceding equation by [f(é)]" and get 


ni p "mo n n PE 
E. f ($) +f (G) fe i (4) so. ne (6.6) 
We put ç(t) = In f(t) and obtain the differential equation 
i pie” —2i(")?—9"p' = 0 (I < A). s (60) 
The initial conditions to be satisfied are 
9(0) = 0; 9'(0) = ikj: 9"(0) =— Ka. ++ (6.7a) 


We obtain, by means of a simple computation, 


2 
e(t) E= Kī in [ Pary Sa —Ka e] 
Ko Ky Ky 


1 


j Kil ("1 —«2) 
so that Fey | Ka e et) ... (6.8) 
Lk, a 
If f(t), as given by (6.8), is a characteristic function, then it is analytic character- 
istic function so that formula (6.8) is valid not only for |t| < A but for all real t. 
We still have to find conditions which assure that f(t) is a characteristic func- 
tion. 
We note first that K, = 0 as well as K/K, = 1 and K, = 0 lead to degenerate 
distributions, these cases are therefore excluded by the assumptions of the theorem. 
In discussing (6.8) we must distinguish three cases. 
Case 1: 0< Kk; <1. Then k,—K,> 0 and n= KR(KJ—Ka > 0. We 
put p = k,/ky, q = 1—p and write (6.8) as 
FO = (pge). vs (6.90) 


If n is an integer then this is the characteristic function of the binomial distribution. 
Ifn — Ois not an integer then it is easily seen that f(t) is the Fourier-Stieltjes 
transform of a function which is not monotone. Therefore (6.9a) is a characteristic 
function only if n is an integer. 


Case 2: 0< Kilk <1: Then Ky—K, > 0 and r= kj/(K,—K,) > 0. We 
put p = Kylka, q = 1—p and obtain from (6.8) 
fit) = pagi. e (6.90) 


This is the characteristic function of the negative binomial distribution, 
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Case3: Ky KO: Then —k, > 0 and k,—k, > 0, we put p = —k,/(K,—k,), 
q = 1—p so that 0 < p < 1,0 <q <1 and write r = k?/(k,.—k,) > 0. Using these 
notations we can transform (6.8) into 
JO = pre(1—g e). ni (6.90) 
We see then immediately that 
e-Wf(—t) = p(1—që)”: 
this is the last statement of the theorem. 


7. SUPPLEMENTARY REMARKS == 


Since it is not possible to exhaust our topic without making the paper exces- 
sively long, we wish to indicate here a few problems which we did not cover in the 
preceding sections. 

We mention first the problem of deriving conditions for the independence of 
certain statistics in a given population. A number of results concerning linear and 
quadratic statistics in normal populations may be found in Laha and Lukacs (1963). 


Let X and Y be two random variables and suppose that the expectation 
G(X"Y*) exists. We say that X and Y are uncorrelated of order (r, s) if the relations 
&(X'¥/) = &(X')&(Y) ma (CË) 

hold for i = 1, 2) 0593 J = b2 9 6 Two random variables uncorrelated of order 
(1, 1) are clearly uncorrelated in the usual sanne. It is well known that in a normal 
population two linear statistics are independent if, and only if, they are uncorrelated 
[of order (1,1)]. A few results concerning the dajë of higher order and 
independence of certain linear and quadratic statistics in normal variables were also 
obtained. Linnik (1954), (1958-59) studied also the more general problem of two 
normal population and raised some open problems and gave 
also some interesting results. The starting point for investigations on independent 
statistics in the normal population is probably the famous theorem of Cochran (1933) 
on the decomposition of chi-squares. nae method of Fe stag fanotions can 
also be applied to some problems concerning stochastic pres we mention in 
particular independently and identically dietabuted stochastic integrals; their study 
leads to characterizations of the Wiener-Lévy process and it might be possible to get 


table processes. 


polynomial statistics in a 


similar results for $ 
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CRITERIA OF ESTIMATION IN LARGE SAMPLES* 


By C. RADHAKRISHNA RAO 
Indian Statistical Institute 


SUMMARY. The existing criteria of consistency and efficiency of estimation have been examined 
in the light of recent criticisms and controversies concerning them. A new criterion called uniform first 


order efficiency which is a better indicator of the performance of an estimator in statistical inference has 


been introduced. It is, however, pointed out that the anomaly in the earlier criterion of efficiency can 


be removed by considering consistent estimators which converge to a normal distribution uniformly in 
compacts of the parameter space. First order efficiency by itself cannot discriminate among a large 
number of estimation procedures. Therefore, an additional criterion called the second order efficiency 
has been introduced, which considerably restricts the class of useful estimation procedures and by which 
several well established estimation procedures could be eliminated in favour of the method of maximum 
likelihood. 
1. INTRODUOTION 

the late Sir Ronald Fisher, is one of the methodo- 
logical processes by which data are analysed or reduced for purposes of drawing inferences 

h data are observed. For instance a sample 


on the unknown population from whic 
survey of consumer expenditure may provide a mass of data which by themselves are 


difficult to interpret. We therefore need summary figures or estimates which provide 
a fair idea of the characteristics of the population sampled and enable us to answer 
a variety of questions. Has the per-capita expenditure on rice increased over time 
and is it different in different regions? Does a given estimate reasonably agree with 
what is believed to be the per-capita expenditure, or-with another estimate obtained 
by a parallel agency? No clear indication of answers to such questions would be 
available without computing from the dataan estimate which represents the per-capita 
expenditure and other quantities which indicate the possible extent of ga in the esti- 
mate and guide us in making judicious statements about the population. Further 
questions may suggest themselves after some initial questions are answered with the 
estimates already obtained. 

There has been a tendency to consider the problem of estimation as a part of 
decision theory, requiring a prestated purpose for the Gj pj and Bpecuipaaton: of 
loss resulting from any given magnitude of error in the estimate. It is not, however, 
my view that the latter approach should be completely abandoned. There may be 


situations where such an approach is necessary and appropriate as in the case of accep- 
res in industrial statistics. Butin a majority of situations the framework 
be applicable and it may be necessary to consider the problem 
int of view as ‘extraction of information’ for drawing 
bstitute for the entire data, for possible future uses. 


Estimation, as conceived by 


tance procedu 
of decision theory may not 
of estimation from a wider po. 


inferences and for recording it, as & su 
: i i i të 4 p at P 
Since estimation, however it may be viewed, involves reduction of data, it 


may entail some loss of information for we are interpreting the data through the 


*Leoturi deliverod on the occasion of the presentation of Shanti Swarup Bhatnagar award 
00 'e 


for 1959. 
This paper has been included in Contributions to Statistics presented to Professor P, ©. Mahalanobis 
n 


on the occasion of his 70th birthday. 
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estimates. The criteria for choice of estimators should then relate to minimisation of 
loss of information. Unfortunately, no objective measurement of information is pos- 
sible and hence the difficulty in the formulation of suitable criteria. However, asymptotic 
theories of estimation based on the criteria of consistency and efficiency (to be referred 
to as v-efficiency) have been constructed and certain methods have been shown 
to yield estimators satisfying these criteria. It was thought that the criteria of con- 
sistency and v-efficiency ensure minimum loss of information due to estimation as 
the sample size increases. 


These theories are not satisfactory due to three main reasons. Firstly, all the 
results relate to limiting properties as the sample size tends to infinity and no indication 
is available of their applicability to samples of sizes ordinarily met with in actual 
practice. Secondly, there seem to exist infinitely many procedures leading to esti- 
mators satisfying the stated criteria and no further criteria have been suggested to 
distinguish among them. Thirdly, the criterion of v-efficiency does not provide a 
satisfactory index of the performance of an estimator from the view point of statistical 
inference. 


I have attempted to resolve these difficulties in some ways (Rao, 1960b, 1961, 
1962), Firstly, the criterion of v-efficiency has been reformulated to ensure some 
optimum asymptotic properties of an estimator used in the place of the sample for 
purposes of inference. This is called Jirst order efficiency. Secondly, another criterion 
known as second order efficiency has been introduced to distinguish among different 
procedures leading to first order efficient estimators. On the basis of the latter cri- 
terion several well-known procedures, such as the minimum chi-square, modified mini- 
mum chi-square etc., which are considered as competitors to maximum likelihood on 
the basis of v-efficiency, could be eliminated. T 
a partial answer to the question of sample size, 
the estimate and of order O(n 
estimation procedures. 


he second order efficiency also provides 
Correction terms of order O(n!) to 
~) to its precision have been determined for several 


The present paper is intended for a further discussion of first and second order 
efficiencies and to introduce a new concept of uniform efficiency which seems to be 
important when asymptotic theories are considered. Some new light is thrown on 
the use of asymptotic variance of an estimator as an index of efficiency. Further the 
second order efficiency is linked with terms of ord 


er (n~?) in the asymptotic expansion 
of the variance of an estimator. Problems requirin 


g further investigation are indicated. 

In undertaking these studies I have been guided by the basic ideas contained 
in two fundamental papers on estimation by Fisher (1922, 1925). I wish to record 
my debt of gratitude to the late Sir Ronald Fish 
from him when I was working under his guidan 
visits to the Indian Statistical Institute. 
nobis, the Director of the Indian Statisti 
on the logic of statistical inference and thi 
constantly exposed, 


er for the encouragement I received 
ce at Cambridge and during his recent 
I also wish to thank Professor P. C. Mahala- 
cal Institute for his stimulating discussions 
© purpose of statistics to which I have been 
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2. CONSISTENCY 

The criterion of consistency is in the nature of identifying the parameter for 
which a statistic is said to be an estimator. This is important from the practical point 
of view of interpreting the estimates. There are various definitions of consistency of 
which the one frequently referred to in literature is probability consistency (PC). 

Definition 2A: Probability consistency (PC). A sequence of statistics 
T „is said to be consistent for a parameter 6 if T ,— 0 in probability. 

But one criticism of such a definition is that it places no restriction on the 
statistic for any given n. An alternative definition of consistency due to Fisher, 
called Fisher consistency (FC) seems to be more satisfactory in this respect, but 
somewhat restrictive in application. 

Definition 2B: Fisher consistency (FC). A statistic 7, = f(S,), where S, 
al distribution function based on 7 observations and f is a weakly conti- 
al defined on the space of distribution functions is said to be Fisher 
= 0, where Fy is the true distribution function from which obser- 


is the empiric 
nuous function 
consistent if f(F}) 


vations are drawn. 
It is easy to see that FC = > PC and that FC refers to a restriction on the 


estimate for any finite n and is not just a limiting property of a sequence of statistics. 
But it is applicable only in situations where independent observations are drawn from 
acterised by a distribution function. 


3. EFFICIENCY 
which we rename as v-efliciency because it is linked 


a population char 


Efficiency of an estimator, 
with asymptotic variance, is usually defined as follows : 

Definition 3A : v-efficiency. Consider the class (7) of reed asympto- 
tically normal (CAN) estimators of 4, i.e., for each TT MT, ON [0, e(0)1. Any 
member of the sub-class for which (0) = 1/i(0) is said to be an AO estimator of 8. 

Tt was believed that for a CAN estimator, the asymptotic variance 2(0) satis- 
fies the inequality ; 

ye, (HD) 
“2 që 
(0) has maximum concentration round the 


D 


i js t v 
lt in estimator with the smalles 
ia A F i at large samples. nfortuna ely, oth these results are not 

true value in suffi y larg Unfort t b 


restrictions on the estimating function or the mode of con- 
to normality. About ten years ago Hodges (see LeCam, 1953) constructed 

vergence to 'më Ai E ËR E 4 Í 

an “mas ble to show that the result (3.1) is not true in gener al. Let 

a xamy T,—E (12) > ns) N 


strictly true without any 


(3.2) 


— az(jej <r) J 
ations from N(0, 1) and æ is arbitrary. It may be 


F serv 
where Z is the average ofn obser 
N with 


: is also CA 
verified that 7', is also a ër Sat 


2, for 0=0 J 
191 


SANKHYA : THE INDIAN JOURNAL OF STATISTICS : SERIES A 


so that the variance at 0 = 0 can be made arbitrarily small. Such an estimate has 


been termed ‘super efficient.’ This example throws in doubt the exact significance 
of v-efficiency. 


Even if there is no lower bound to asymptotic variance, the question remains 
as to whether we should prefer the estimator T „as defined in (3.2) to E because of smaller 
asymptotic variance at least at one point and equivalence elsewhere. It can be easily 
seen that for any given n, T, has better concentration than T, in the sense of higher 
probabilities for intervals enclosing the true value, only for the special values of 
9 = 0 and a small neighbourhood of zero, and thereafter for a continuous set of 0, T',, 
has less concentration than Z. This may also be inferred by comparing the mean 
Square errors (m.s.e.) of T, and z. For any given n the m.s.e. of 7',,is smaller than that 
of Z for 0 close to zero and thereafter it stays larger, although the difference tends to 
zero as 0 increases. It may, however, be observed that the m.s.e. in either case tends 
to the corresponding asymptotic value but the anomaly arises due to convergence 
being not uniform in the case of T;,. We shall have occasion to stress the impertance 
of uniform convergence in a later section of this paper. An attempt to improve the 
concentration in the neighbourhood of a particular value of the parameter seems to 
have injured the performance of the estimator at other values. A general statement 
to this effect is proved by LeCam (1953) using bounded risk functions. Superiority 


as judged by asymptotic variance functio 


n need not therefore indicate greater concen- 
tration for all v 


alues of the unknown parameter even in sufficiently large samples. 
Consider another super efficient estimator a; 
U,=%  (|z| >n) 


(3.3) 
noe] ae VO). J 


where v, is the sample median and g is arbitrarily small. The statistics (3.2) and (3.3) 
have the same asymptotic variance and are therefore indistinguishable on the basis 
of v-efficiency. There must, however, be some difference in the performance of these 
two statistics, the estimator (3.3) being essentially equivalent to the sample median 
when 0 = 0. 

Since there is no lower bound to the asymptotic variance of a CAN estimator, 
it may be thought that an improvement over @ is possible by constructing a statistic 
T, with a uniformly lower asymptotic variance and thereby increasing the concentra- 
tion at every value of the parameter, as at 0 = 0 in examples (3.2) and (3.3). LeCam 
(1953) has demonstrated that such an improvement is not possible for any continuous 


interval of the parameter and the set of points with a lower asymptotic variance 
has to be of Lebesgue measure zero. 


Can we avoid all these troubles by considering only efficient estimators in 


the sense of Definition 3A and not trying to improve upon the asymptotic variance 
1/i(0)? The following example provides an answer to this question. 
Let VI: (lë) > n=) 
see (3.4 
ja (3.4) 
JË E -1/4 
( 5 Em (2) gn ) 
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where 7 is the sample mean and 2,, is the sample median. W, is also CAN with the 
same asymptotic variance v(0) = 1 for all 0 as that of z. The estimator W, is thus 
indistinguishable from 7 so far as consistency and v-efficiency are concerned. Yet 
for any given large n, W,, has less concentration than that of Z for all values of 0. 


It is no doubt true that an estimator having a higher concentration than 
another for every value of 0 is more useful in drawing inferences on 0 from an observed 
estimate. That such a situation is realised for an estimator compared to another for 
sufficiently large n cannot be judged by comparing the asymptotic variances only 
as shown by examples (3.2), (3.3) and (3.4). It is, however, difficult to choose between 
two estimators when one does not have uniformly better concentration than another 
without bringing in other considerations. For instance, we may have an estimator 
whose distribution for a particular value of 0 is highly concentrated but it will be a 
poor discriminator between this value of 0 and other values close to it if the concen- 
tration at the other values is low. To compare the estimators 7, Tp, U, and Wp, we 
may examine one aspect of their usefulness in statistical inference e.g., the power func- 
tions of tests based on these statistics to test the hypothesis that 0 has an assigned 
value. It may be inferred from the optimum properties possessed by 7, that in large 
samples & and 7”, tend to have the same local power (Rao, 1962) whereas U, and W, 


being equivalent to the sample median when 0 = 0, will have a smaller local power. 
Since v-efficiency does not enable us to distinguish between estimators such as Z or 


T, and U, or W, we shall consider an alternative definition of efficiency (to be called 
n n 


first order) which appears to be more satisfactory. 
Definition 3B : First order efficiency. A statistic 7’, is said to be efficient 
if 
nè (T, —0)—A0)Z, 1> 0 i 65) 


where 2(0) is a function of 0 only, and Z, = nfd log P(X, 0)Id0), P(X, 0) being 
the density of the observations. The condition (3.5) implies that the asymptotic cor- 


relation between 7’, and Z, is unity. 

Rao, 1960b) that according to definition 3B, 7’, is just 
per efficient in the sense of v-efficiency and U,, and 
at 0 = 0 although U, and W, are super efficient 
and efficient respectively in the old sense. If the efficiency of an estimator is measured 


by the square of its asymptotic correlation with Zw then U, and W, bave the bane 
efficiency 2/7 < 1, although U, and W,, have different agjnpiolie KË E Tt is 
also shown (Theorem 2 in Rao, 1962) that = estimator satisfying, DE efficient in the 
sense of Definition 3B provides a locally more powerful test of a simple hypothesis 
any other test in sufficiently large samples. Another important 
nition 3B of efficiency is that the ratio of 1(7',) the Fisher’s 


the estimator T, to I, the total information in the sample 
Doob, 1934; Rao, 1961). 
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The Definition 3B of efficiency implies that the limiting distribution of n}(T’,,—0) 
is normal for any given 0 and in large samples, any simple hypothesis on 0 can be tested 
by using the normal approximation. But in problems of statistical inference, it is 
often necessary to express our preference for different values of 0, on the basis of the 
estimate as in the case of interval estimation, and not just examine whether a parti- 
cular value is true or not. There is thus for a given n, a need to consider the whole 
set of distributions of the estimator for all values of 0 at least in a small interval (in 
large samples) where different values of 0 have to be distinguished. If the distributions 
are to be approximated by appropriate normal distributions, it seems to be a logical 
necessity that the convergence to normality of the chosen estimator should be uniform 
in compacts of 0. Under fairly general conditions the convergence to normality 
of niZ (9) is found to be uniform in which case the desired property is assured by the 
following definition of uniform first order efficiency. 


Definition 3C : Uniform first order efficiency. An estimator is said to have 
uniform first order efficiency if 


mjr,—0—Zz (010) 40 = (8.8) 


in compacts of Ø, where the symbol UL stands for uniform convergence in law and 
1(0) is Fisher’s information per observation. 


It would have been more natural to define uniform first order efficiency as 
n*|T,—0—f(0) Z,(0)| 0 s (38.7) 
without specifying the value of P(O) as in (3.6). Tt appears that if the condition (3.7) 


is satisfied for various values of A(0), then it is desirable to choose an estimator for 
which (0) is a minimum which is shown to be [i(0)] = in section 4 of this paper. 


4. Some LEMMAS 


Notations and assumptions. We consider only sequences of independent and 
identically distributed variables with probability density p(a, 0), where 0 is a parameter 
with values in an open interval@. Inthe case of discrete variables, p(x, 0) represents 
the probability of v. The probability density of n observations is denoted by P(X,,, 0). 
The first derivative P(x, 0) = dpte, 0)/d0 exists. Let a(x, 0) = p'(x, O)Ipte, 0) and 

10) = Eglate, 0) 
Fisher’s information per observation be continuous in 0. The following assumptions 
are made in the various lemmas of this section. 
Assumption 1 : (i) Mo, 0) = Elaz, 0) = (0— 04) i (05)--o(0—0,) 
(i) Lo(6,, 0) = Vylate, 0)) = i(0o)+0o(1) 
(iii) c(0,, 0) = COV, late, 0), a(x, 09)1 = i(0o)+0(1) 

2-+e 

p'(x, 0) 


Assumption IL: H më de) 
pë 0 (pe, 8) 


< © for some: > 0 in compacts of 0. 
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Assumption III : E 
Let ble, 6, 8) = log Ipte, G)ipte, 6o)1 
(i) EO, 69) = Boble, 0, 00] =— C 0,) 4 (0-02 
(ii) [9(0, A)? = Volle, 0, O)] = (0—0)? i(0,)-+-0(0—0,)2 
(iii) (0, Oa) = coveslbte, 0, A), ate, 9)] = (0—0) i (0) +0(0—0,) 
d d? 


Assumption IV : (i) dr [P(X n O)jdv = I en O)dy 
for every Lebesgue measurable set Ep, 
lot 
(ii) p, 2&0) is bounded in compacts of 0. 


° | ple, 0) 


The Assumptions I, II, and IJI are not severe. Conditions may be imposed directly on 
the probability density to ensure them. For instance restrictions such as those im 
posed by Danials (1961) on the probability density will imply the conditions (i)-(iii) 
of Assumption I. 

Lemma 1: Let A,(0) be the power function of any test of the hypothesis 0 = Oo 
based on a sample of n independent observations, at probability level a. Then under 


Assumptions I-III 
lim B(O)-+dn) < P(a—ôi?) a AT 
nao 


where © is the distribution function of N(0, 1) and a is the upper æ point of N(0, 1). 

The limit of A(Q)+6n) when it exists is known as Pitman power of the test. 
Lemma 1 gives an upper bound to Pitman power under some conditions on the pro- 
bability density of the observations. ‘Two limit theorems of a different type concern- 
ing the local power of a test have been given in an earlier paper of the author 


(Rao, 1962). 
Let 
e stata 
Z,(0) i n PAR 0) 7 ( a ) 


BË 0) il 

PA, bo) n 
“,(0) = MJY,— E(0, 0o)1/% (0, 9) 
0) (0) = MALY a+ Eos 91/0 (Gos 0) 
w,(0) = Të Z(O) iO- 

sumptions I, TI, and ILI, it is easy to show that 


a + log E Das, 0, Op) 


Under the Asi 


(i) Vool%n(9)—Wn (09)1 > 0 as 050, ne (4.2) 
(ii) Vol (0) NG ea >0 as 0-8, I) 
(ii) w,(0) > N(O, 1) in compacts of 0. ne (44) 
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The best test of the hypothesis H, : 0 = 0) against the alternative 0,, = 9)-+dn-? 
is 
un(0)) > Cn s (4.5) 
where c,, is chosen such that the size of the test >a as n> oœ. 


lim Po,[w,(9,) > Cn] = lim Po,[w,(0,)—w,(Oo)+wn(9o) > en] 
no nja 
= lim Poçlio,(09) > en) by (4.2) 
e ed 


Since the limiting distribution of 20,(09) is N(0, 1), c,— a the upper æ point of N(0, 1). 
The power of the test (4.5) is 
PRO) = Pejlu,(0,) > Cn) 
= Poy (tn(On)—Wn(On)+Wn(On) > On 


3 No 0) | MELO 05) +E Om 6) 
= Poa (on QË OOHO > eur a] 


writing u,(0,) in terms of v,(0,) using their definitions. 


[Eos On) + EO nr 90) \ 


lim f:(0,) = lim Pı 
eh va Sa “e plo) eit NO ns 00) 
= 0 (a—di?) using (4.4) of uniform convergence, 


where —di? = lim MLE(Io, 0,)--E(9,, 09)117(0,, 9). The result of Lemma 1 follows 
no 
by observing that £3(0) > £,(0) for each 0, where A,(0) is the power of any other test. 
Lemma 2: Let nT, —0) N(0, [y(0)]?) in compacts of 0, where y(0) is 
bounded. Then 
(i) y(0) is continuous if the probability density p(x, 0) is continuous in 0. 
(ii) MO)” < 1/i(0) under Assumptions I-III. 
We use an argument similar to that of LeCam (1960) to prove (i) of lemma 2 : 
If p(x, 0) is continuous in 0 the distribution function Fon of T „ is continuous 
in 0 and consequently the characteristic function e,(t, 0) of U, = mT,,—O) is 
continuous in 4. Since U,, converges uniformly, c,(t, 0) converges uniformly to c(t, 0) 
the characteristic function of the asymptotic distribution N(0, (y(0))2). But c(t, 0) 
is continuous. Hence (9) is continuous in the interval of the uniform convergence 
of U,. 
Let us consider the test 
ni(T ,—0o) 
Yo) 
of the hypothesis 0 = 6), at a probability level æ. The power of the test at 0 is 
PAKO) = PTa — 0a) > AnY(Oo)} 


oe 


= p, JËTË) _ ni(0—6,) 
s ay > ae 9 way 
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Substituting 0 = Orën and observing that the convergence to normality of 
n4T,,—0) is uniform in 6, we find 
Jim f,(0p+ 50) = Dla—8/r(0)] ve (46) 


where the argument of ® in (4.6) is the limit of 


AjY (69) _ ™*(O—9%) 
(A) (9) 


with 0 = 0,+-6n-?, as N— œ. 
Tt is shown in Lemma 1 under Assumptions I-III 


lim f,(0)--on-#) < O(a—dët). 
No 


Hence from (4.6) 


B(a—d/y(4)) < P (a— dit) 
or a—djyldo) > a—dë 
i.e., (Oo) > 1) (for any given 9). 


ptotic variance of CUAN (consistent uniformly 
asymptotically normal) estimator has Fisher’s lower bound 1/i(0) when the probability 
density satisfies some regularity conditions. It appears then that in the examples of 
Hodges and LeCam, super efficiency in the sense of having asymptotic variance less 


than 1/i(0) has been achieved at the sacrifice of uniform convergence. 


Lemma 3: Let 


We thus see that the asym 


Z,0) T,,—O\ UL 0. 
ni { KO) (A) je oe 
Then 


uR N(0, [y(O)P) in compacts of 0, where y(0) is continuous, under 


tinuity of ple, 0), and 
yt under Assumptions II and IV. 


G) n(7,—9) 
Assumption IL and con 
(ii) y0) = [ilO 
Under Assumption Il, 


ntZ (0) TË N(0, 1 a 

nar NO es) 
ni(T,—) TË 

and hence “=e (0, 1) ve (4.9) 


since by the condition (4.7) of Lemma 3, the difference of (4.8) and (4.9) D 0, Hence 
the result (i) of Lemma 3 follows. 


Consider the test 
Mi(T,—0o) > CnYla) w (4.10) 
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of the hypothesis 0 = 8, at a probability level «, where c,,— a, the upper x probability 
point of N(0, 1). The power of the test (4.10) at 0, = hoH ôn is 
Balloon) = Ponlè(T pa —0o) > c,y(00) 
Ta — 0) ~ Cayo)  O— 0) 
SPa EE > EnYVo o) 
ai WA l 10) 10) 
lim AL(6ç--dn”t) = Pla—sy(0,)1 ne (4.11) 
In 


using the uniform convergence proved in (i) of Lemma 3. It appears from (4.11) that a 
test of the hypothesis 0 = 6, based on T, does not attain the full Pitman power 
@(a—di?) unless y? = i. It is therefore interesting to know whether the condition 
(4.7) of Lemma 3 itself implies that y? = 7. I have been able to establish this result 
only under the additional Assumption IV but it is worth examining whether such a 
strong assumption is necessary. 


Under condition (i) of Assumption IV we have the expansion of the power 
function 


Balbon) = BaO) +È HUA me (412) 


and under condition (ii) of Assumption IV, PA(O')Im is bounded in an interval of 0’ 
enclosing J). From (4.12) we find 


lim lim dale tër“) AO) tim n0). 
d njo 


60 now 


Hence dim Se = lim n-1%(0,) ve (4:18) 
The limit of the R.H.S. of (4.13) is 
l (i/2m)te—a2/2 wee (4.14) 
using the result of Theorem 1 in an earlier paper (Rao, 1962). The value of the L.ELS. 
of (4.13) is 
(27y?) e—a], (4.15) 
Comparing (4.14) and (4.15) we find y? = i which establishes (ii) of Lemma 3. 
Lemma 4: Let (m(T,,—O), NËZ AGON in law to a bivariate normal distribu- 
tion uniformly in compacts of 0, with the asymptotic covariance matrix 
( Oi) (0) B(A) ) 
P(A) B(0) 40) 
Then under Assumptions T-III 


E KOS) Z (01—itT,—9)) Fo, 
e Lemma 4 implies that y- i P ë r 
te Biden Pies that v-efficiency of UCAN is equivalent to uniform firsh 
Consider the test 
ar, — ot AZ(9%)] > cpe 
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where o? = 1/i(0)-+Ai())-+2Ap(9) the asymptotic variance of the test statistic. 
Using an argument similar to that of Lemma 3, the Pitman power of the test is 


®(a—6(1+Az)/o). 
By the result of Lemma 1, 


ta < dë, for any arbitrary A 


or (1i)? < 1--24ip--A2i2 

which implies p = 1 at 0 = 6, (any chosen value). The asymptotic variance of 
n[Z,(0)—i(T,—0)] is then zero, and since the convergence is assumed to be uniform 
the desired result follows. 

The results of Lemmas 1—4 under the conditions assumed on the probability 
density of the observations can be summarised as follows. 

(i) If, is UCAN, the asymptotic variance of Tọ, has Fisher’s lower bound 
1/ni. This implies that the concept of v-efficiency is not void when the class of 
estimators is restricted to UCAN. 

It may be noted that the existence of such a lower bound to the asymptotic 
variance was established by Kallianpur and Rao (1955) under some conditions on the 
estimator such as Fisher consistency (FC) and Frechét differentiability. Recently 
(Kallianpur, 1963) relaxed the restriction of Frechét differentiability to a weaker 
form due to Volterra. Some observations on lower bound to asymptotic variance of 
a CAN estimator have also been made by Bahadur (1960) from a different point of view. 


(ii) Uniform first order efficiency of T,, implies that it is CUAN and 


v-efficient. 

(iii) Th 

that the joint asymptotic distribution 
vergence is uniform in compacts of 0. 
It may be interesting to examine other conditions under which the existence 
of a CUAN estimator 7, with v-efficiency implies uniform first order efficiency. 
Restrictions on the estimator such as those imposed by Kallianpur and Rao (1955) 
and Kallianpur (1963) may be sufficient. 

The investigations of Section 4 show that v-efficiency is a valid and useful 
concept if only we restrict our consideration to estimators which are consistent and 
uniformly asymptotically normal in compact intervals of the unknown parameter, 


e converse of (ii) has been established under the additional assumption 
of T, and Z, is bivariate normal and the con- 


5. SECOND ORDER EFFICIENCY 


The second order efficiency is defined in earlier papers by Rao (1961, 1962) 


m asymptotic variance of 

nZ, —P(E — 0) yT n—9)"] e 633 
ised with respect: to Y. Under some conditions this minimum value is 
lue of the difference in the actual amounts of information 


as the minimu 


when minim 
equivalent to the limiting va 
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contained in the sample and in the statistic. It was also shown (Rao, 1961) that for 


the m.l. estimate the asymptotic variance of (5.1) is the least, thus establishing its 
highest second order efficiency. 


it may be seen that the concepts of first and second order efficiencies are not 
explicitly linked with any loss function. It is also not important which function of 
0 is under estimation. We could, for instance, define first order efficiency as 


në | Za — PAKT )— KONI 0 


in probability for any function f admitting a continuous first derivative. Similarly 
the second order efficiency could be defined as the minimum asymptotic variance of 

(Z,—PUf(L n)—fO) YT, )— SOY) ve (5.2) 
where f admits a continuous second derivative. The expression for the minimum 
asymptotic variance in either case (5.1) or (5.2) would be exactly the same. Similarly 
if T, is altered as 

T+ Ln) 
n 

where g is a smooth function, the first and second order efficiencies remain the same 
although from the point of view of quadratic loss function there would be difference in 
pestor order (1/n?). So the first and second order efficiencies as defined refer to 
some intrinsic properties of an estimator (statistic) used as a substitute for the whole 
sample for purposes of inference on the unknown parameter. 


In a discussion on my paper (Rao, 1962), Lindley thought that the superiority 
of the m.l. estimate is probably established through some specific loss function impli- 
cit in the definition of second order efficiency. It is, therefore, proposed to compare 
different estimators in a more direct way by assuming a quadratic loss function. 
Before doing this, the procedure has to be cleared of some unpleasantness arising 
out of some samples of relatively small frequency leading to large deviations in the 
estimator and making the expected loss unduly large. We shall, therefore, omit a 
portion of the sample space and compare the performance of estimators over the rest 
of the sample space. Usually the total probability of the portion so omitted rapidly 
diminishes to zero as the sample size increases and the value of the estimator over 
this portion could be defined arbitrarily except that it should be bounded. 

We shall consider the case of the finite multinomial distribution as in the earlier 
paper (Rao, 1961). Let us represent the theoretical frequencies in the k cells by 


774(0), ..., 7,0) 
where 0 is an unknown parameter, the observed proportions by 


Pis -+s Pr 
and the estimating equation by 
FO, Pa, -.-s Pë) = s+ (5.3) 
where L(G, m0), ..., TO) = 0 
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Në ap the estimator satisfies Fisher consistency. We shall assume that f as a function 

, Pı +++) Py admits third order partial derivati i i 

ee P erivatives which are bounded in a closed 
OKHKL at 

and for values of 0 satisfying (5.3) with (Pj, ..., PH)EP- The true point 7,(A), ..., 7,(0) 

is assumed to be an interior point of P. Let 0* bea solution of the ehe E 3) 

such that 0'— 0 as p;—>n; (0). Then expanding f(0*, py, -> Pe) by Taylor’s theorem 


at 0, 7,(0), ..., (0), we have 


dj 
Y (ro) 8 (pm) 


pi of A 
Is Së Z n, ÒT, (Pr T)(Ds—Ts) 


— 3 (oop SË pë E aa (mt 69 


Due to the boundedness of the third order partial derivatives, if we define 6* arbitrarily 


in O—P, except that it should be bounded, it follows that 
B(O”—6) = O(n), Ke) = O(n). 


quation (5.8) is such that first order efficiency is satisfied then 


im 
on, HOG) KS 


in which case, dividing (5.4) by 6f/60, the left hand side expres- 


If the e 


as shown in (Rao, 1961) 


sion can be written 
or—0— 2,(0)Ni 


where Z, = Spar (p,— DI Tf the right hand side of (5.4) without e, divided by 
presented by 8, we have the approximate rela- 


afla and (0“—0) replaced by Z,|¢ is re 
tion 
Groma = Se ve (6.5) 


B(O*—0) ~ B(S,,) = 6(0)/n 


Hence 
mator up to terms of O(1/n). Such a bias has no 


aluated up to terms of O(1/n). Otherwise correc 
The correction can easily be done by considering 


the bias in the esti 


where b(0)/n is 
are error is ev: 


effect if the mean SQU 
tion for bias seems to 


the estimator 
t= agt de) 


be necessary. 


Sa 


in which the bias is o(1/n). We shall evaluate E(8—6)? upto terms of O(1/n2), 
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Consider the approximate relationship 
Boyo 2). 
n 
which on differentiation with respect to 0 yields 
nE(6°Z,,) ~ iy HË ne (6.8) 
Further VÊ) ~ V(0"—Z,,b'(0)|na) 
~V(0°)—2b'(0) |n2i 
using (5.6) and 
V(O—Z, li) = VO") +V(Z,/i)—2 cov (0”, Zli) 


VOTE ni “mi (E) 


' 26' 1 
SH ni mi 
= (6)—+ (5.7) 
From (5.5) V(0"—Z, i) ~ V(S,) HO (say) 
Using (5.7) we have 
MHL, i so (6.8) 
ni në ni” 


We shall compute (0) for some methods of estimation 


and compare the values. The 
variance of 0”, without correction for bias, is 


a nè 
(i) Maximum ce 5 For the method of maximum likelihood (m.1.) 
= Z,(W,—gZ,,) _ un Zi 
a 42 23 
= U log 7,/d0?)(p,—7,), g = (H—hao)li 
Urs = Er;(mi |m) (i fr; 
The bias in the estimator and the value of y(0) are 


where 


46) SH j= E 
LL 
YO) = yS H- EUF, N | an V(Z2) 
= Hox — Hor kao 1 Vë ao)” ah in 
3 dë A 244 
= VlËl.). 


(5.10) 
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The variance of the m.l. estimator without correction for bias is 


1, vml) 1 dim yj 

nit ew i (ae) i (z) 
which agrees with the expression given by Haldane and Smith (1956). It may be 
seen that yr(m.1.) is connected with #,(m.1.), the index of second order efficiency defined 
in the earlier paper (Rao, 1961) by the relation 


y(n.) = Bafa.) +o 


It may be seen that the m.l. estimator corrected for bias is similar to the esti- 
mator given by Lindley (1961). For other properties of m.l. estimators reference may 
be made to papers by Cramér (1946), Daniels (1961), Doob (1934, 1936), LeCam (1953, 
1956), Rao (1957, 1958, 1960a), Wald (1949) and others. Uniform consistency and 
convergence to normality of m.l. estimators are considered by Kraft (1955) and 
Parzen (1954). 

(ii) Minimum chi-square. A theoretical investigation of the asymptotic 
properties of minimum chi-square estimates is contained in papers by Neyman (1949) 
and Rao (1955). The estimating equation is 

“Dr 
x Er —0 
and the value of $, is 


l; ZAWn—94n)_ iu pe 
: k 
ENE E E E SLE =ni 
where Q= ae 4 Ty) E Za (*) (Dp—71,) 


By using the expressions already derived in Rao (1961), the bias in the minimum 


chi-square estimate and the value of yr(0) are 


KO) Af Kartë 
Të 4 ali ns 22 } 
(0) = nV (Sp) Syl) 
2 
1 Tr — Hao q Ho a Më 
where ës a 2 (7) a ia ; 


which is non-negative and zero only in special cases. 
(iii) Minimum modified chi-square (Neyman, 1949). The estimating equation is 
ul 


ZH = 0 
Di 


leading to the value of Sp 


7 Z(Wn—-94n) dë pa 
OE zj E pj 
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The bias and 7/(@) are 
j BO) lgj Lyme, Zso—ln 
mn si a Zë } 
(0) = 46-+-Y(m.1.). 
. The estimating equa- 


(iv) Haldane’s minimum discrepancy (Haldane, 1953) 
tion, after a slight modification which does not effect the treatment of the present 


paper, is 
TË m, 
2 T’ 


giving the value of S,, 
a(Wan—g2n) Ko g2 
"o 23 no 


—(k+1) (Q+ 50 Zi) + 


The bias and %(0) are 
b(@) 1 * {— ris m, , (kr) kso Ëa) 
n ae 213 


W(0) = PEY Ke 


The estimating equation is 


Minimum Hellinger distance. 
“ pè 
zT Pr 0 
m 


(v) 


n(W—9Zn) par Hi Z2 
as ” 


giving the value of $, 
1 4 Z 
2 (0+ 2a zi )+ 2 


The bias and y(0) are 
09) 1fj 1 oT, , Bao —2 
mm = oa “që a 
ô 
y0) = z tml). 


(vi) Minimum Kullback-Liebler separator. The estimating equation is 


in, log 5 = 0 


r 


giving the value of S, 


1 M30 72 ZAWn—94n) _ ën 72 
(e *)+ a gjy Ze 


The values of bias and yr(0) are 
KO) 1j 1 gT Kaha 
{ 2i më Ni 212 ) 
V(0) = SHy(mil.) 
204 


2 


CRITERIA OF ESTIMATION IN LARGE SAMPLES 


It is seen that among the six methods compared, the mean square error in 
the estimator corrected for bias is the least in the case of the m.l., when terms up to the 
’ order (1/n®) are considered. It may be shown more generally (following the mecha- 
\ nism developed in the earlier paper, Rao 1961) that under the assumptions made 
on the estimating equation f(0, p) = 0, the m.l. estimator has the least value for 7(6). 


The bias and variance for estimators corrected for bias, obtained by the dif- 
ferent methods considered in this section are given below, where ô and y(m.l.) are as 
defined in (5.10) and (5.11). 


Ee eee 
variance of estimator 
corrected for bias 


bias 
method of estimation (coefficient of x) coefficient coefficient 
of ni of n- 
eee Eee 
ikeli Ha l yml) 
maximum likelihood — Gjë 7 
1 Te) Kotku . l 5 Pat 
minimum chi-square an (= ) a z +y(m.1.) 
: ls nj kë Zjgo— Mai 1 43-4y(m..) 
modified minimum chi-square i tae oF 5 


e x (=) _ (k- so en : (b--1)23-+y(m.1.) 


Haldane’s minimum a ag aa 
discrepancy 
1 T, \ q Hotu 1 ò Tymi) 
minimum Hellinger distance —— x ( m IK qa - zt ( 
aes; (7) n Pan 1 sm) 
minimum K.-L. separate Fi zi Fi : 


ee 


vessions for bias and variance will be similar in the case of estimation 

bee eee tinuous distribution. The conditions to be assumed on the esti- 
of parameters in 8 ne e probability density will be very severe if an expansion of the 
mating pë a up to terms of order (1/n*) is desired. A recent paper by Linnik 
asymptotic eae on the computation of the variance of the m.l. estimator in 
and Mitrafanova (1 the nature of the complexities involved. 


a continuous case shows 
205 


13 


SANKHYA : THE INDIAN JOURNAL OF STATISTICS : SERIES A 


REFERENCES 
Banapur, R. R. (1960): On asymptotic efficiency of tests and estimates. Sankhya, 22, 229-252. 
CRAmËR, H. (1946): Mathematical Methods of Statistics. Princeton University Press. 


Dantets, H. E. (1961): The asymptotic efficiency of a maximum likelihood estimator. Proc. Fourth 
Berkeley Symposium on Mathematical Statistics and Probability, 1, 151-164. 


Doos, J. L. (1934): Probability and Statistics. Trans. Amer. Math. Soc., 36, 759-772. 
—— (1936): Statistical estimation. Trans. Amer. Math. Soc., 89, 410-421. 


Fisher, R. A. (1922): On the mathematical foundations of theoretical statistics. Philos. Trans. Roy 
Soc., A, 222, 309-365. 


(1925): Theory of statistical estimation. Proc. Camb. Phil. Soc., 22, 700-725. 


HALDANE, J. B. S. and Surru, SHEILA MAYNARD (1956): The sampling distribution of a maximum 
likelihood estimate. Biom., 43, 96-103. 


HALDANE, J. B. S. (1953): A class of efficient estimates of a parameter. Bull. Int. Stat. Inst., 38, 231. 


KALLIANPUR, G. (1963): Von Mises functionals and maximum likelihood estimation. Contributions to 
Statistics, presented to Professor P. C. Mahalanobis on his 70th birthday. 


KALLIANPUR, G. and Rao, C. R. (1955): On Fisher's lowerbound to asymptotic variance of a consistent 
estimate. Sankhya, 15, 331-342. 


KRAFT, CHARLES H. (1955): Some conditions for consistency and uniform consistency of statistical 
procedures. University of California Publications in Statistics, 2, 125-42. 


LeCam, L. (1953): On some asymptotic properties of maximum likelihood estimates and related Baye's 
estimates. University of California Publications in Statistics, 1, 227-330. 


(1956): On the asymptotic theory of estimation and testing hypotheses. Proceedings of the 
Third Berkeley Symposium on Mathematical Statistics and Probability, Berkeley and Los Angeles, 
University of California Press, 1, 129-156. 


(1960): Locally asymptotically normal families of distributions. University of California 
Publications in Statistics, 3, 37-98. 
LinDLEY, D. V. (1961) : 


The uso of prior probability distributions in statistical inference and decisions. 
Proc. Fourth Berk 


eley Symposium on Mathematical Statistics and Probability, 1, 453-468. 

LINNIK, Yt. V. and MITRAFANOVA, N. M. (1963) : 
maximum likelihood estimate. 
on his 70th birthday. 


Parzen, E. (1954): On uniform convergence of families of sequences of random variables. 
of California Publications in Statistics, 2, 23-54, 


Rao, C. R. (1955) : Theory of the method of estimation b: 
Inst., 35, 25-32. 


Some asymptotic expansions for the distribution of the 
Contributions to Statistics, presented to Professor P. C. Mahalanobis 


University 


y minimum chi-square. Bull. Int. Statist. 


(1957): Maximum likelihood estimation for multinomial distribution. Sankhya, 18, 139-148. 

——— (1958): Maximum likelihood estimation fo: 
number of cells. Sankhya, 20, 211-218. 

= (19609) : 
28, 25-40. 


r the multinomial distribution with an infinite 


A study of large sample test criteria through properties of efficient estimates, Sankhya, 


(1960b): Apparent anomalies and irreg 


ularities in maximum likelihood estimation. 32nd Session 
of the Int. Stat. Inst., 


Tokyo. Reprinted with discussion in Sankhya, 24, 73-102. 
———— (1961): Asymptotic efficiency and limiting information. Proc. Fourth 


Berkeley Symposium 
on Mathematical Statistics and Probability, 1, 531-546. 


(1962): Efficient estimates and optimum inference procedures in la 


rge samples, (with discus- 
sion). J. Roy. Stat. Soc., 24, 1, 46-72. 


Wald, A. (1949): A note on the consistency of maximum likelihood estimation. Ann, 


Math. Stat. 20, 
595-601. 


206 


rë 


SOME REMARKS ON THE POWER OF A MOST 
POWERFUL TEST* 


By L. SCHMETTERER 


Mathematical Institute, Vienna University 2 
and 
Statistical Laboratory, The Catholic University, Washington 


i SUMMARY. Consider test problems for a simple hypothesis against a simple alternative. 

It is noticed that the quotient of the power of the most powerful test and the size « of the testis a Foi 
increasing function of «. The behaviour of this quotient near x = 0 is studied. 

Let (R, S) be a measurable space that is: R is a nonempty set and Sa 

o-algebra of subsets of R. Let Py and P, be two different probability measures 

defined over (R, S). Let œ be a real number with 0<a@<1. An S-measurable 


map p, from R into JO, 1] is called a test of size æ for P, against Pj 
if Elga; Po) = J ga(x)dPy < a. 

E(,; P) is called the power of the test. The set ø, of all tests of size æ is not empty. 

The map 7, defined by 7,(v) = æ for every veR is an element of g,. Ta is called the ' 
trivial test. For every pair Py, P, and for every æ there exists at least one Freda 


with the following property : 
Elpa; P)K Elga; Pj) 
for all çatfa. This is nothing other than Neyman-Pearson's fundamental lemma. 


Fa is called a most powerful test for P, against P,. The'aim of this paper is to study 


the map «— (ga: P1) for the set of all pairs (Po, P) with P, 4 P,. This last condi- 
tion will not be repeated in the sequel. We will denote this map by e—+f(x) whatever 


the pair (Py, P)) may be. 
Lemma 1: For a with 0 KA < 1 we have a < Ala) < 1. 
t is enough to compare Te and ça what is well known of 


For the proof i 
course, 
Lemma 2: 


This follows from the definition of £. 
P is continuous in 0<a<1 and the map a> Baja is non- 


piaja = land 1 < Paja < la, OSE <1. 


al numbers with 0<o,<1,0<a<1. Let 4 


pis nondecreasing in 0O Sa < 1. 


Theorem : 
increasing. Furthermore, an 
‘eis 


Proof: Let aj, % be re 
ty be positive numbers with t4-+fz=1. Obviously, 0 < haj tt <1. Let Za, 
It follows that 0 < tipa + tea < l and 


and ga, be most powerful tests. 


s done with support from National Science Foundation Grant GP-96, 
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Elta t togas ; Po) = tH (Ger ; Po)- Hë (Gan $ Py) < herttaa and this means that 
lgart bopas€ Ptrxitx,. Therefore, we get the inequality : 


E(tygo 1 tapas; Py) < E(Gtrerstoxs 5 P1) 
or ; ` tbla) +tplaa) < Piht). < (1) 
Lemma 2 together with (1) proves the continuity of £ for 0< œ< 1. Therefore 
2(0--0) exists and is > 0. From this and (1) we obtain the monotonicity of the 
function «— A(a)ja. It is sufficient to take a = y, ta = d/y, 0< 8 <y <1 (all 
other cases being trivial) and to make æ,—> 0+0 in (1). 


The last two statements of the theorem are obvious. Connected with this 
theorem are several problems which also may be of practical interest. For instance, 
are there test problems, that is pairs (Po, P4), such that A(a)ja = 1 for alla ? What 
is the lim £(æ)/æ (which must exist according to the theorem)? Can the upper 

a—>0+0 


bound in the inequality A(a)je = lja, 0< < 1, be attained 2 And so on. To 
begin with the last problem the answer is an easy one. 
Lemma 3: If Py and P, are orthogonal then B(a)/a = lla forno Sa KI. 


For the proof it is enough to consider densities fọ resp. fı of Py resp. 
P, relative to a dominant measure pz and to define for every « the most powerful test 


f 1 wefa: fitu) > 0} 
pal) = 

a «ela: fitu) SO. 
It follows that A(a)ja = lja for O<a<1l. 


Concerning the lim /(«)/a we would like to consider first a very well known 
a—o+0 


and important example. 


Example 1: Consider two normal distributions P, and P, defined over the 


Borel sets of the Euclidian R, given by the densities Lë e-?/2 and i g (®2-9)?]2 
Von M2 
for all veR, witha>0. For every æ let k(x) be the unique solution of TE T cepi = a. 
T Kia) 


The most powerful test for Py against P, is given by 
J 1 ka)<x<0 


pale) = 
LO —o < t E ka). 
It follows that Plea = FP ae] Fe dima toyo 
K(a)—a Ka) 


~N 2 log for c—0--0. 


'Therefore i Ble)la = co. 
a—>0+0 


* If f and g are two real functions then f(x) ~ g(x) for æ —> a means lim fælge) = 1, 
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Another simple and important example is the following : 
Example Il: Let Py be the same distribution as in Example I and P; a 


—a2 [262 


normal distribution given by the density as e for all veR, witho > 1. It 
p 2m 
is easy to see that in this case lim («)/a = co also. This leads to the conjecture 
a—>0+0 


that a most powerful test is always “infinitely better” than the trivial test for 
a— 0+0. But it is easy to disprove this conjecture. A most powerful test can be 
“almost as bad” as the trivial test. - 


Lemma 4:. Let f be a density relative to Lebesgue's measure in the R, with 
the following properties: For all xeR, f(x) = A(—v): f is continuous and strictly de- 
creasing for x > 0. Let o be a real number >1 and suppose that v—f(x|o)|f(x) is strictly 


increasing for x > 0. If Po is given by f and P, by the density noe f(xjo) then À Hn E 
Ble)ja is finite or infinite according as x—> f(alo)if(w) is bounded or not. 
Proof: TË follows from the assumptions made that tee f(alo){f(x) exists. 
x eo 


If it is finite then it is >o. If for every a, g(@) is the unique solution of the equation 


f f (w)dx = aj2 then the most powerful test is given by 
aa) 


1 gla) Kx <a, —00 < x K—gla) 
{ 0 in the compliment 


It follows that Baa = * TT ftejoaz) proi 


i 1 
i = lim — a)/o a 
and : Jim Zea din e Kalo Hate) 
e or infinite according as 1 f(x/o)/f(x) is bounded or not. 
Example TI : The Cauchy distribution is an example which satisfies all the 
bo given by the density era and P, by 


and this limit is finit 


assumptions of Lemma 4. Let Po 


U 4 

1 1 : 1 — gi. Therefore lim A(e)a—o 
Tr pa De IFEF lige — 

and ø can be chosen as close to 1 

and in the Examples 1, II and III the function a— A(x)ja 


e conjecture that this always is true is also 


as one wants. 


In Lemmas 3 and + 
always was strictly decreasing. Th 


wrong. We consider 
et Py be given by the density fy with 


oe >o 
st 
0 E 


Example IV: L 
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and let P, be given by the density fj with f,(x) = fo(e—a), a > 0, for all veR,. It is 
easy to discover that a most powerful test for P} against P}, if « < e”, is given by 


f qe” agzxeczoo 


Lo —O KEKA. 
Therefore  f(a)/a = «eja = e for all,  K e—. 


The last problem to be considered is whether there exists a pair (Po, P,) such that the 
trivial test is a most powerful test. The answer is of course “no” (Lehmann, 1959). 
Let fo resp. fj be densities of P, resp. P, relative to a dominant measure e. There exists 
an S-measurable set Æ of positive z-measure such that fj(e) > fotv) for xeZ. There 
exists also an £ > 0 and a set E, C E such that f(x) > f(x) + £ for weH, and (E) 
2620. If f(x)= 0 for all xe, except perhaps on a set of ju-measure zero, 
choose the test 


f 1 «e 
pale) = 
1 a weR—H, 


0<a<1, Then H(y,; P))— aP,(R—H,)+P,(E,) > a. 
If u fola)du = ay > 0, dhoose the test 
1 


1 xeE, 
fajl) = ` 
0 gxeR—E. 


Then Elea; Ps) = f filodu > | folw)du-ten(B,) = a teë > oy. 


From the foregoing considerations of the lemmas and examples we have the conclu- 
sion that beyond the statements of the theorem no further general statement concern- 
ing the behaviour of a—>f(«)|æ can be made. 


Suppose that T is an arbitrary set of indices y and let P,, yeT, be a family of 
regular Radon probability measures over the Borel sets B of some locally compact space 
R. Let P, be another measure of this kind and different from all P,'s. A test of size 
æ, 0 < æ <1, is again a measurable map fa from R into JO, 1] with the property 

i Elpa; Py) <a 
for all yer. Suppose that the set of all measures Py, ye”, and P, are dominated by 
a Radon measure px over (R, B). Due to the weak compactness theorem (Lehmann, 
1959) there always exists a most powerful test pe. Define J as before. It is easy tosee 
that Lemma 1, Lemma 2 and the theorem remain true for this more general case also. 


REFERENCE 
LEHMANN, E. L, (1959): 


Testing Statistical Hypotheses, John. Wiley-Chapman, New York-London, 67, 
354. 


Paper received : December, 1962, 
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ON THE CONSISTENCY OF LEAST SQUARES REGRESSION 


By HERMAN 0. A. WOLD 
University Institute of Statistics, Uppsala, Sweden 


; SUMMARY. In classic regression analysis there is a dualism between the treatment of a regres- 
sion relation (1) in experimental and non-experimental situations, the variables æ, ..., zg in the first 
VRR being regarded as a sequence of fixed numerical values, in the second as random variates having a 
joint probability distribution. A unified treatment of linear regression is here presented on the basis of 
eo ipso predictors, a novel name for the old notion of stochastic relations defined in terns of conditional 


expectations. The key theorem (Section 2) involves some rearrangement and generalization of the custo- 


mary treatment. 


1. Eo IPSO PREDICTORS 


In a stochastic relation 


Y HS (ey s tA) HY ka ag (0) 


with Bly |, -> €r) = SR e Va) <2 13} 


the function f (+) is called an eo ipso predictor of y. 


We shall consider the case when f(%, ---» xy) is linear, say 


E Bot Bits Botë: Herir ty ae (8) 
with Ely lë «+» Er) = Bot Brtrt+ Bate. + Bitn- ax (4) 
Writing to SY 


e variables by 


vve denote the observed values of th 
(a — 1,..., 2). cn (6) 


Coas Via “119 Vha 


For the observed first and second order moments we write 


n a . 
Mi = = Xia Mig = = Viatka (, k= 0,1,..., h). one (7) 
a= 


and second order will be denoted 


The theoretical moments of first 
(i, k = 0, 1,..., h). nn (Sab) 


py = El) Hie = Eve) 
Formulas (Sa-b) cover the situation — typical for the analysis of nonexperi- 

mental data— when 
Y= Vo), Tao -+> Th xn (9) 


a joint probability distribution. According to a fundamental 


are random variates with i i 
py can in this case be expressed in terms of (4), 


theorem by Kolmogorov (1933), 
jo = Bly) = Bens vn BY |p ++» %)] = (10) 


and similarly for /fo +++» Pol 
211 


SANKHYA : THE INDIAN JOURNAL OF STATISTICS: Series A 


The following situation—typical for the analysis of controlled experiments— 
is also covered by formulas (8a-b), namely when y(—o) is a random variate, whereas 


Ejas +00, The (@ = 1, ..., %) wos, UGLY 
are arbitrarily fixed numbers. We shall say that a, ..., œp then have a controlled 
distribution. In a factorial experiment, for example, each x; may have two equally 


weighted levels; this gives a controlled distribution formed by 2" sequences of type 


(11), all having the “probability” 2-*. Formulas of type (10) are valid also in this 
case. 


The two situations mentioned cover the cases usually treated in the textbooks, 
but they do not exhaust all possible specifications of (Sa-b). 


2. LEAST SQUARES ESTIMATION OF EO IPSO PREDICTORS 


As is clear from (1)-(4), the notion of eo ipso predictor is more or less syno- 
nymous with the notion of regression relation. The similarity extends to the proce- 
dures of parameter estimation, inasmuch as eo ipso predictors under very general 
assumptions can be consistently estimated by least squares regression. 


Theorem: Let (3)—(4) be a linear eo ipso predictor that satisfies the following 
two assumptions. 


(a) The ergodicity assumption : 


prob lim m; = pg prob lim my Shi GO, L,...,h) a (12) 
n—>co n> 


or in words, the observed first and second order moments have the corresponding theoretical 
moments as stochastic limits as the number of observations increases indefinitely; 


(b) The nonsingularity assumption : 


detlygj 05 (i,k =1,...,h) vee (13) 


or equivalently, the joint distribution of i, ...,%, involves no collinearity. 


Then the least squares regression of Y on 2, ..., %, say 


Y = bot djaj... - dyer, “e (14) 
will provide consistent estimates for the parameters; in symbols 


Bab ii, b; = Pi (i = 0,1, Fahy h). (15) 


The proof of the theorem will be arranged in four steps. 


(i) The assumptions (3)-(4) imply that the residual v h 


as zero mean and zero 
cross product moment with each of the variables zj, 


++, Gj in symbols, 
E) = 0; E(va;) = 0 ( — 1,..., 4). (16a-b) 
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To see this we note that (3) and (4) give 
Elle t) = Zo. Tilti -s %) = 0 ... (17a-b) 


Next, (17a) gives 
Kv) = {Bol see dP = 0 (18) 
distribution of ti, ..., Cr This verifies 


where the integration is taken over the joint 
(16b). 


(16a), and using (17b) the same argument gives 


(ii) The assumptions (3)-(4) imply that the parameters A: satisfy the 


following system of linear relations, 
( Bottabst- . -FitnPn = Ho 
Mabokimbit vetin = Hon 


bBo nda: -Fitondn = Hon 
the right-hand member, we multiply 


(19) 


To prove, for example, the relation with Ay: in 


(3) by ti, 
E + eeBrt vat, = Yi 
Taking expectations on both sides, the term that involves væ; will vanish according to 
(16b), giving 

Bolts Bakrit + FB abla = foi: 
by definition, is formed so asto give the 


(iii) The least squares regression (14), 
variance. This is the same as to make the square sum 


residual w the smallest possible 


S = E Oot bitte Hoi" (20) 
as small as possible. Hence bo dis «++» by should satisfy the conditions 
A = (ia 0, ben h). (21) 
This is the same as 
Bot maby te Hda = My 
(22) 


mgbort marr ts bmn = Mo 


are given by (7). 


ystems (19) and (22) are formally the same as the normal 


The equation § 
retical regression coefficients J; and empirical regressions coefficients 


where the moments Mi, Nik 


(iv) 
equations for theo 


bi respectively. 
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To conclude the argument, we infer from (13) that the solutions £; of system 
(19) are uniquely determined and are continuous functions of the moments Lu Miri 
hence making use of (12), we obtain (15). In conformity with the notations adopted, 
the stochastic convergence in (12) and (15) may be taken to be convergence in probability. 
The argument is valid also for other modes of convergence that are preserved under 
continuous transformations, for example convergence in the sense of the strong law of 
large numbers (again see Kolmogorov (1933)). 


It will be noted that the solutions Ži and b; of systems (19) and (22) as well as 
the corresponding residuals and residual variances are given by the same determinant 
expressions as in the traditional treatment of least squares regression (see, for example, 
Wold and Juréen (1952-53)). 


3. COMMENTS 


In more or less general form the above theorem is standard material in statisti- 
cal text books ; see Cramér (1945-46) and Plackett (1960). We have given the proof 
in detail because it differs from the customary treatment in several respects. For one 
thing, the theoretical normal equations (19) are derived from the assumption that the 
theoretical relation (3) constitutes a conditional expectation plus a residual, whereas 
the normal equations (22) of the sample are derived from the criterion that the observed 
residual should have the smallest possible variance. In this way the key of the theorem 
is the assumption (12). As is well known (see Cramér (1945-46) and Wold and Juréen 
(1952-53)) this assumption is fulfilled in a variety of situations, notably (a) the case of 
independent replications of a controlled experiment, and (b) the case when the variables 
Y, Tis ..., a are given as time series that are stationary and ergodic. 


The novel features of the theorem involve some slight generalization. The 
main point is perhaps that residual noncorrelations 


Py, x) = 0 Ge) i (23) 
usually are adopted as assumptions, while here, as seen from (16), they are obtained 
as implications of the basic assumption (4). Further it will be noted that indepen- 
dence between the residuals is not required. Thus in the case of stationary time series, 
for example, autocorrelation is permitted in the residuals and in the variates 1), ..., tp; 
see Wold and Juréen (1952-53) where the analysis covers the case when all variates 


- Cis ..., ty are exogenous, and. Lyttkens (1963), where one or more of these variates are 
allowed to be endogenous, that is, are specified as the variate y subject to a lag. 


As to the rationale of the basic assumption (2), we note that in applied regression 
analysis the approach (2) is entirely in line with the use of the relation (1) for fore- 
casting purposes. To put it otherwise, the weak point of the customary assumptions 
(23) is that in most nonexperimental situations little or nothing known about the resi- 
duals and their correlation properties, whereas once we have decided for what fore- 


casting purpose we want to use a regression relation, the corresponding assumption 
(2) is required as a stochastic rationale for the forecasting procedure. 
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"h: grë theorem has been stated without proof in earlier papers by the auth 

( ci (1959-60)-(1963)). The naming of the relations (1)-(2) has been ch. 4 
unbiased predictors (Wold, 1961) to eo ipso predictors (Wold 1963) The pë am 
using conditional expectations in connection with forecasting SË. i oe 
of old standing. Specific reference is made to the literature on interde Sal 9 Ë ee 
(se Haavelmo (1944), and Hurwicz (1950) and many later works) E ha 
is placed on the specification of forecasting relations in terms of conditional i 
tions, but the analysis is given another twist by assuming that the variates d ree 
the same structural specification in the past and the future. baking 
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ADDENDUM 


Additional Note to the Paper “On Asymptotic Expansions For Sums Of 
Independent Random Variables With A Limiting Stable Distribution.” By 
Harald Cramér, Sankhya, Series A, 25, 13-24. 


While the above-mentioned paper was being printed, my attention was drawn to 
the paper “On the magnitude of the error in the approach to stable distributions” by 
M. Lipschutz, Proc, Koninkl. Nederl. Akad. Wetensch., A, 59, No. 3 (1956). In this 
paper, the asymptotic properties of the distribution function F(x) are studied for _the 
case when the independent random variables z}, £, ..., are positive and have a common 
absolutely continuous distribution function F(x), , 
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GENERALIZATION OF THE F ISHER-DARMOIS-KOOPMAN-PITMAN 
THEOREM ON SUFFICIENT STATISTICS 


By EDWARD W. BARANKIN' and ASHOK P. MAITRA” 
University of California, Berkeley 


and now classical theorem (or, rather, theorems) referred to in 
the title shows that, for a family of n-i ties of product form, with identical 1-dimensional 
factor densities, the existence of & sufficient statistic of dimension < n is essentially equivalent to the 
condition that the 1.dimensional factor density involved be of exponential type (see below for more precise 
deseriptions). In this article we drop the feature thar the factor densities be identical, and we obtain 
theorems again rolating the existence of lower dimensional sufficient statistics win the fact of exponential 
type for the factor densities. Results of tho classical type fall out as corollaries. 
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this smallest 7, to relate the functions y, A = 1, 2, ..., 7 to any sufficient statistic of 
dimension s (see Corollary 5.2 here below). Pitman does not confine himself to s = v, 
but his representation, like that of Darmois, has r = s. Pitman also is the only one 
to consider the case in which the carrier of the density p,(-, 0) varies with 0. The work 
of Koopman is distinguished by a fully precise formulation of thisorenis and rigor 
of proof : he is explicit about the assumption of analyticity of the function py. 


The present paper is-motivated by the question : to what extent can we genera- 
lize the form IT Polti, 0) of the family of distribution densities in Euclidean n-space 
and still find that the existence of a sufficient statistic of dimension s < n is equivalent 
to an assertion of exponential type regarding the component distributions involved 2 
We shall show here that such theorems are obtainable when the factor densities in the 


n-fold product above are not necessarily identical. The Darmois-Koopman-Pitman 
result thus comes out as a special case. 


We shall be concerned, then, with a family Xy of probability distributions in 
Euclidean n-space that is specifically structured as follows. Let Dig Pas vez Qh 
be n 1-dimensional open sets, and set Q = Q,xO,x...xQ,. Let © be a v-dimen- 
sional open set. For each ¢@ and each i = 1, 2, ..., n, let pj, 0) be a strictly posi- 
tive probability density with respect to Lebesgue measure on Q;; and for each 
i= 1,2,...,n, let the function Di, of the variables £ eQ; and 0 = (Ois 99, ..., 0)E€0 
be continuous and have continuous partial derivatives gë: al J= 2, cyt 

Pr Ep ; 


3200, = BOJE J = 1,2,..., v, throughout Q,XO. For each x = (11, Ta ..., %,) EQ and 
00, let p(x, 0) be defined by 


ple, 0) = Ni pite 0). E) 


Then, our family Xy is that having the function p of (1.2) as one determination of its 
family density function relative to Lebesgue measure on Q. Notice that pl, 0) 


is strictly positive throughout Q for each 00), so that the densities of 7, all have the 


common carrier Q. Our present study does not, therefore, cover the case of a variable 
carrier—the case whose investigation was initiated by Pitman. Indeed, our methods 
here are drawn from the fundamental paper of Barankin and Katz (1959) and the 
dimensionality problem for a variable carrier was not considered there; this problem 
remains to be treated. 


We have not assumed analyticity of the densities p; 
the continuous differentiability detailed above. 


able first to prove the local results of the type 


Theorem 3.1 gives, for a family Xy defined by (1. 
for the existence of 


to start with, merely 
With this lesser assumption we are 
announced; this we do in Section 3. 
2), necessary and sufficient conditions 
a locally sufficient statistic which is locally Euclidean of dimension 
8 Sn and locally continuously differentiable. Theorem 3.2 considers that there is 
given a statistic 7' which is Euclidean of dimension s < nat a particular regular point 
2, and continuously differentiable about x; and it states necessary and sufficient 
conditions that T be sufficient for Xy about 2°, Thus, the first theorem is purely 
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existential, while the second theorem addresses itself to a particular statistic in hand. 
The latter point of view is that taken in Koopman’s (1936) investigation. The remainder 
of Section 3 presents two corollaries, which are the specializations of the two theorems 
to the classical case of identical factor densities. 

Our ultimate aim is to obtain global theorems corresponding to Theorems 
3.1 and 3.2, after invoking the additional hypothesis that the factor density functions 
are (partially or fully) analytic—the effectuating assumption made by Koopman (1936). 
Looking to this, we first devote Section 4 to deriving certain consequences of the ana- 
lyticity hypothesis; we do this in general, not restricting ourselves to product densities, 
We are then ready to achieve the desired results in Section 5. Now, the analyticity 
hypothesis is not, in itself, enough to secure the extension of the necessary and suffi- 
cient conditions in the local case (Section 3) to necessary and sufficient conditions in ~ 
the global case; Theorem 4.1(iv) provides a condition that we are obliged to use in the 
sufficiency part of our arguments. Consequently, in Section 5 we ob two global 
theorems corresponding to the single local Theorem 3.1: Theorem 5.1 gives necessary 
conditions that there exists a statistic which is sufficient for Xr in Q’ (the domain 
of analyticity of p; in general, a subset of Q), is Euclidean of dimension s < n in Q’, 
and is continuously differentiable about some regular point of Q’; and Theorem 
5.2 gives sufficient conditions. In a similar way, corresponding to the local Theorem 
3.2 there are Theorems 5.4 and 5.5, the first of these giving global necessary condi- 
tions and the second giving global sufficient conditions. 

Corollaries 5.1 and 5.2 are the global correspondents of the local Corollaries 
3.1 and 3.2, covering the specialization of our general product-density results to the 
case of identical factors. The interesting fact E DUR ee alluded ug shove, 
that presented in (iv) of Theorem 4.1, is automatically satisfied in the case 2 identical 
factor densities, and therefore the specializations of Theorems 5.1 and 5.2 ine 
identical factor case fall together into a single Pn of aap and sufficient 
conditions, this being Corollary 5.1. And likewise does Corollary, pea give Kë 
and sufficient conditions in a single statement on specializing Theorems 5.4 and 5, 


£ identical factors. 
to the pri së 5.2 presents the classical Koopman (1936) results. However, our 
or 2 


ct i what different from that of Koopman : our definition (the Modern 
jarris oe t statistio is less stringent than that of Koopman, and the statistics 
es i ge Satis continuous, as in Koopman’s treatment. The net thas 
I H ” së obtain, in Corollary 5.2, a single statement of necessary and sufficient 
0. is Is tha ain, 


conditions, whereas Koopman was not able to achieve this in his context. ' 
Th 5.3 gives an alternative set of sufficient conditions in the existence 
eorem 5. 8 


jj Section 2 we have recapitulated the definitions and results of our past work, 
n Se a 


on which the present investigation relies. w doing so, we have a we hee Vj 
ments of results to make them more readily vë mt nats, Thus, ection 2 renders 
icle fairly well self-contained, and in addition will serve to shed some 
previous articles. 
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2. RESUME OF PAST RESULTS 


The definitions and theorems we shall need in order to achieve the conclusions 
at which this article is aimed have been successively built up through the four papers 
(Barankin and Katz, 1959 and Barankin, 1960a, 1960b, 1961). In view of this 
dispersion of sources, and also because certain modifications of presentation are 
indicated for purposes of this present work, it is advisable to devote this section to a 
brief, organized setting out of previous results. We shall then be in the favourable 
position of having to refer subsequently only to this Section 2, thereby eliminating for 
the reader the necessity of collating statements out of different past works. 


Let 7 be a family of probability measures on the Lebesgue subsets of an open 
set Q in a Euclidean n-space of points a = (2, ta, ..., %,). The family is parametrized 
by a point 0 = (04, O», ..., 0») ranging over an open set © in a Euclidean v-space. Each 
measure in # is absolutely continuous with respect to Lebesgue measure, and we 
consider that there is a determination, p, of the family density function which is strictly 

ae : E Op Op Op Ap 
positive throughout Q x ©, and is such that all the derivatives dx; 2 TA 3 32.57, s 900, > 
t=1,2,...,n;7 =1,2,...,v,° are continuous in Qx@. (From the continuity of 
the mixed second derivatives it follows that the order of differentiation is irrelevant). 


A function T on Q is said to be Euclidean of dimension r at «° if there is a 
neighbourhood of x° such that 7 maps this neighbourhood into 


Similarly, we shall speak of 7 being Euclidean of dimension r in 
or in all of Q. 


a Euclidean r-space. 
a given subset of Q, 


Let T be Euclidean of dimension r at x; specifically, let 


T(x) = (hy), ha), ..., bute) ve (2:1) 
in some neighbourhood of x°, If the functions h;, i = 1, 2. 


differentiable (in some neighbourhood of x) 
entiable about x. 


5, r are continuously 
then we say that T is continuously differ- 


Let T be Euclidean of dimension r at x, 


being given by (2.1) in some neigh- 
bourhood of ao, 


If T is continuously differentiable about x, and the Jacobian matrix 


| Oh; 


| Ox; 


(2.2) 
t= 1, 2, a Ye MË 


is of rank r at a9, then we say that T' is regular at a9, 


_  'Thefollovringlemma (stated as Le 
precise form (given by Bahadur ( 
statistics, and provides the referen 


mma 1.1 in Barankin and Katz (1959)) is the 
1954)) of the factorization theorem for sufficient 
t criterion for all our recent work. 


r Lemma 2.1: A necessary and sufficient condition that the statistic T of the 
family X be a sufficient statistic Jor W is that there exists a nonnegative function f on 
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Fe XO, and a nonnegati i 
A egative function g on Q, such that (i) for each 
To +e A H i pe 0; TG 
measurable (ii) g is Lebesgue measurable and (iii) for each O60, the ot he 5 mons 
. p(x, A) = KT), 0) g(x) 
holds for almost all (Lebesgue) te Q. a 


f in this statement, 72p denotes the range of the statistic T. We shall use al 
e notation 72pz to denote the range of the statistic T restricted to the në a 
ain 


Bea. 
For a given measurable subset B of Q, we may form fr i 

E of conditional probability distributions relative A r cen 
a Pi al may be applied: and this will result in a factorization condition of the for vë 

.3) restricted to B. Thus, global sufficiency for V—that is, sufficiency o & 
O—zelsted to sufficiency for derived conditional distribution families si ne pë El 
fashion of reduced domain of validity of (2.3). It follows that for discussions ee 
ing the building up of globally sufficient statistics from functions which aoe 
on restricted domains, we could elect to speak in terms of sufficiency for derived oes . 
tional distribution families. But in fact there is no particular advantage for us i 
doing this here, and we shall therefore take the simpler route, as in Barankin (1960b) kë 
defining restricted sufficiency directly in terms of the factorization criterion E 
A statistic T of X is said to be sufficient for #7 in B if there is 
a nonnegative function f on ers X 0, and a nonnegative function g on B, such that 
(i) for each 0 € 0, fT) is a Lebesgue measurable function on B, (ii) g is AS 


measurable on B, and (iii) for each 0 e O, the equality 
ple, 0) = KT), Ag) we (2:4) 


Definition 2.1: 


holds for almost all (Lebesgue) xeB. 

Tt is handy to make also the fo 
A statistic T of #7 is s 
for FA in some neighbourhood OL dë 
basic to all our work; it is Lemma 2.3 given by i 
e here the revised form ofit stated as Lemma 


llowing definition : 
Definition 2.2 : aid to be sufficient for 47 about the 
point ate Q if it is sufficient 
The following lemma has been 
Barankin and Katz (1959), but we quot 


2.2 in Barankin (1960b). 
Let T bea statistic of # which is Euclidean of dimension r at 
ufficient for P about 19. 

N of ao such that eriy is a neighbourhood of 
zegjy KO, and g, on N, such that 


Lemma 2.2: 
2° and regular at 2°, and which is s 


Then there is @ neighbourhood 
e functions f, on 


T(x), and there are nonnegativ 
p(x, 0) = KT) O)g(x) for all xe N, 060, Eo (2:8) 
: x aha of of 99 ô? 
and such that JU- 0 has continuous partial derivatives 5 an 34.20,” f 
fy ) OY; 00; 24:00, 90,0y: k 
og 


i en A 1, Base Vë in sen XO and g has continuous partial derivatives an 
a; 


p= 1,2, «+, NIN N: 
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This lemma plays the crucial part of providing locally- continuously differ- 
entiable factorizing functions when a locally sufficient statistic is Euclidean and regular. 
And it implies, moreover, that the factorization holds, for each Qe 0, not merely 
almost everywhere in N, but everywhere in N. 


Let be any particular point of Q. Let jj, jo, ...,j be any particular sa 
integers, each chosen from the set {1, 2,..., vj. and let 00, A, ..., 00 be any parti- 


cular n points of ©, not necessarily all distinct. We define the nxn matrix 


| we (2.6)° 
ts, B= Lgj 
Let £L, denote the class of all matrices (2.6) for the given point x. We define 
the integer-valued function p, on Q, as follows : 


| jot logp 
ehy, g. js OO 9 My — || ‘= 
Lx; jas Jas seer Jn? 0 , 0 79 0 ) U (a0, ) 2. gë 


pie) = max (rank L). ws (2.7) 
Le £, 


And if S,,, denotes the open sphere in Q of radius g, centered at 2°, we define another 
9,0 p 


integer-valued function p on Q by 


p(x) = lim max p(x). “e (28) 
alO we So, ka 
We have always pie) < pla), ve (2.9) 


and we make the following definition : 


Definition 2.3: A point a is called a regular point of Q (for the family #) 
if equality holds in (2.9). 


The set of regular points of Q is denoted by R. The significant facts about 
the set R are these n 


Lemma 2.3: The set R of regular points of Q is an everywhere dense, open 
subset of Q. 


The function p is continuous on R; and, in fact, R is precisely the set of points 
of continuity of the function Py 


The problem of minimal dimensionality of sufficient statistics has been solved 
in terms of the function p. The solution of the local problem is given by the following 
two theorems (which are Theorems 3.2 and 3.3 in Barankin and Katz (1959). 


Theorem 2.1: If T is a statistic of X which is Buclidean of dimension rat 
x° and continuously differentiable about x° 


$ (but not necessarily regular at x), and if T 
18 sufficient for 79 about a9, then r > p(a9). 


Theorem 2.2: J frisa regular point of Q, then there exists a sufficient statistic, 
T, for ® which is Euclidean of dimension pla9) at 2°, and regular at 2°. 

pi proof of the latter theorem gives an explicit construction of such a suffi- 
cient statistic T. The following theorem gives the significant aspect of this cons- 
truction (we give a modified statement of Theorem 3 in Barankin 


(1960a) or Theorem 
2.3 in Barankin (1960b)) 
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Theorem 2.3: Let a9 be a regular point of Q, and let To = p(x). Let 
Jas Jos “es Jro be integers, and OY), A), ... Oro) be points of © such that the matrix 


jo log p | 
I 
vs (210 
ESA Be 1, 2) rors i TË Dude to ( ) 
is of rank ro. Let the functions 7, on Q accordingly be defined by 
ô log p(x, 0) 
NG pe A 65 1 ie My 
mle) = ( Bi pe b= TD N na (2.11) 


Then the statistic To, given by 
Tole) = (mle), Tate). ---s Ny, (2) ve (2:12) 
is sufficient for 77 about 2°. 
Local minimal dimensionality, as presented by Theorems 2.1 and 2.2 above, 


is achievable in a single globally sufficient statistic simultaneously almost everywhere 
in R. This is the assertion of Theorem 4.1 in Barankin and Katz (1959), and we 


restate it here : 


Theorem 2.4: There exists a sufficient statistic, T'i, for X which, for almost 
all (Lebesgue) points a9 e R, is Buclidean of dimension p (9) at £, and regular at æ, 


The statistic 7’* of this theorem is built up (see Barankin and Katz, 1959) from 
the locally sufficient pieces of the statistics 7’, of Theorem 2.3 above. These pieces, in 
addition to being dimensionally minimal, are functionally minimal. This fact gave us 
Theorem 4.3 of Barankin and Katz (1959), which was restated as Theorem 2.5 in 
Barankin (1960b). But the formulation of the mentioned Theorem 4.3 is somewhat 
in error, and this was corrected in Barankin (1961). The correction was effected by 


proving, in fact, the following result : 
Theorem 2.5: Let jy be any particular integer between 1 and v inclusive, and 
09 be any particular point of O; and define 
ô log p(x, 0) ) (2.13) 
ne) = (365, ho 
If T is any sufficient statistic for Jo, then 9 is almost everywhere a function of 
Tin Q. 
The correct assertion regarding functional minimality of the statistic 7 is 
consequently the following (see Section 2 of Barankin, 1961) : 
Theorem 2.6: The JO-sufjicient statistic TË of Theorem 2.4 is locally almost 
rywhere (Lebesgue) functionally minimal at almost all (Lebesgue) points of R. That 
ry oP ch point xin a set A E R, with R—A of measure 0, there is a neighbourhood N 
a “ + . 
of “he pr if T is any sufficient statistic then there is a subset Cy of N of measure 0, 
pë pe T*(x') = T*(w") whenever « and x" are points of N—Cy with T(a') = Te"), 
u i 
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3. THE LOCAL RESULTS FOR A FAMILY OF TYPE Xy 

Theorem 3.1 below is our basic result relating the existence of a sufficient 
statistic of dimension s < 7 to the fact of exponential type of the component densities, 
for a family of type Wy defined in (1.2). In anticipation of the proof, we first establish 
the result of Lemma 3.2 below. 

We shall say that there is a constant linear relation among finitely many given, 
real-valued functions on O, say C, Ca, ...,¢,, if there exist constants a», Ky, .. 
not all 0, such that 


a, 


oy Ans 


E aol) =%, 660. a (3.1) 
i=l 


The following lemma has been proved in Barankin (1961), and we quote it here 
without proof. 


Lemma 3.1: There is no constant linear relation among the functions Cis Cg.::50y 
on © if and only if there exist integers jj, jas ..., Jr, each between 1 and v inclusive, and 


corresponding points 0,0, ...,0° of © such that the matrix 
| 


do, 
{ atin) wel 5 p= E e 


(3.2) 


is nonsingular. 
We now have the following lemma. 


Lemma 3.2: Let cj i — 1,2,...,r be real-valued functions on © and Pri 
As 1,2,...,15 1—0,1,2,...,r, be constants; let 


b(0) = Brot È Bue0), 0O z.. (3.3) 
SUL ut 


7 so 4 ° dyvarip. valait . 
Then, there is no constant linear relation among the functions db, A= 1, 2, vee Ty if and 
only if the matrix IA, i= 1,2)... 28 nonsingular and there is no constant linear 


relation among the functions c; i = 1, 2, ..., r. 


Proof: For any integers jy, Ja, ...,j,, each between 1 and v inclusive, and any 
points 00, 98, ..., 00 in ©, we have from (3.3) that 


ab, \ | na nit d0: 
(TE) gal = MA Sa) yan 
where the indices A, k, 4 all range from 1 tor. The product matrix on the right is non- 
singular if and only if both factors are nonsingular. It follows that there exist j,’s 
and 0's such that the matrix on the left is nonsingular if and only if Ilësll is non- 


singular and there exist j,’s and 0’s such that the second factor on the rightis non- 
singular. Then, applying Lemma 3.1, we have the asserted result. 


We shall say of the expression (1.1) that it is in reduced, form if there is no cons- 
tant linear relation among the functions by, A = 1, Baca With this, ieaie row dy 
zt 5 
to state and prove the basic theorem. 


; v (8.4) 
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5 Theorem 3.1: Let Py be the family of distributions defined in (1.2), and 
let x? = (a9, 28, ..., 49) be a regular point of Q for the family Xa. 

A necessary and sufficient condition that there exists a sufficient statistic for Py 
about x? which is Buclidean of dimension s < n at x? and continuously differentiable about 
ao is that for some integer r < s there be n—r of the factor densities, say Prit, Pro 5-45 P. 

pee PA 


with the following three properties : 
(i) they are all of local exponential type as follows : 


log prë, 8) = 9%0)-+ HME) + È (ONE), sae (88) 


te some neighbourhood of xm; 060; 
m=r+1,7r+2, ..., 1, 
with all functions appearing in these expressions being continuously differentiable; 
(ii) there is no constant linear relation among the r common parametric functions 


by, A= 1, 2, ..., 7, that enter into (3.5): that is, everyone of the n—r expressions in (3.5) 


is in reduced form; : 
(iii) the functions by, À = 1, 2,...7 are constant linear combinations of the 


logarithmic derivatives, evaluated at x°, of the remaining r factor densities, thus : 


4 Z o (Alog Pilti 0) E. 4 
(0) = Bot È Aral Or, Uta IZ et zy (80) 


wherein the matrix ||Bill is nonsingular. 
For r = 0 these conditions are to be understood as follows : (3.6) is vacuous and 


the r-term sum on the right-hand side of (3.5) is 0. 

The integer r for which the above properties are-verified is unique and is precisely 
the local minimal dimension of a continuously differentiable, Kuclidean sufficient statistic 
for Py about 2°. 

Proof: 
that r = plas) (see Section 2). 


Woe shall first establish the last statement of the theorem, namely, 


ach py t= 1, 2, ...,% is a product ofa function of 0 
and a function of ti, © and all a; in some neighbourhood of 2°. It follows 
that p itself is a product of a function of 0 anda function of x, for all 0e0 and all x 
in some neighbourhood of x°. In this case—see Definition oj constant function 
ina neighbourhood of x° provides a continuously differentiable, Euclidean sufficient 


statistic for Xu about ao, Thus, p(x) =0 =r. 
Now suppose * > 0. Since, by (ii), there is no constant linear relation among 


= 1, ee follows, on applying Lemma 3.2 to (3.6), both that Jlëill is 
at this assertion in (iii) is a consequence of (ii)—) and that the r 


TË 7 = 0 we see that e 
for all 0¢ 


the by, 
nonsingular ( 
functions on © 
ô log p:i 9) i=l, 2u? 
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have no constant linear relation among them. By Lemma 3.1, then, there exist 
integers jı, Ja: ---, Jr and points 00, 0), ..., OO in © such that the matrix _ 


pase a 


GEAD ) z9, al 


14 B= 1,2 t 


is nonsingular. If we notice that—since p is the product of the p; and each variable 
x; enters into p only through the single factor p—we have 


dlogp _ dlogp: I=L? 


= gë a» (8:9) 
dx: Ou, ” Diii ( 


then we see that (3.8) is identical with the matrix 


E. logp 


3.10 
02 ,00i;, ie gë ( ) 


R Pr 


which is therefore nonsingular. Recalling the definition of p in (2.8), we conclude 
that p(x?) > r. 


To establish the reverse inequality, let us consider the second mixed partial 
derivative of log p with respect to “n and O;, form >r-+1. Taking account of (3.9), 


we obtain this from (3.5). If, moreover, we substitute for the b, from (3.6), and again 
use (3.9), then we have, on evaluating at « = 2°, 


atl nye 3 a1 
( A wo EL 2 Fata) ( 22,08, ) oo! os 


m=r+1,r+2,...,2; j = 1,2,..., v. 


From these relations we have the following consequence : 


for any set of integers 
Jv Jay ja and any points 00, 02, ...,0™, the matrix 


(Plog | 
ER al an 


iks 1, 2, 05.5% 


is such that each of the last n—r rows is a linear combination of the first r rows. It 
follows that the rank of this matrix is not greater than r. By the definition of the 
function Pi (see (2.7)) we have, therefore, that pi) <r. But a” is a regular point, 


ka that p(x) = p,(2°). Hence, we have p(x?) < r, and this, combined with the reverse 
inequality derived above, establishes the equality p(x9) = r, as was asserted. 


We now turn to the proof of necessity of the conditions (i), (ii) and” (iii). 
Suppose there exists a sufficient Statistic for Ay about the regular point x°, which is 
E uclidean of dimensions s < n at 2° and continuously differentiable about x°. Then, 
by Theorem 2.1, we have P) Ks <n. For brevity, let us set r = p(a0). Consider 
first the case of r = 0, In this case we have, by virtue of the definition of P; 

0? lo 
( AR = 9 089,i—1,2,...,nsj— DZ a GD) 
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E, 
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But, by Lemma 2.3, p is identically 0 in a neighbourhood of 2°. Choosing a rectangular 
neighbourhood, we have, therefore, the following stronger statement : 


2 
PEP 0, xeN,XNgx ... XN,: 060, a (814) 


91,08, 


SH D5 ai JE wcey'V5 


where, for each i, N; is a 1-dimensional neighbourhood of af. Bringing (3.9) to bear. 
the equations (3.14) become 


a? log pi(E, 0) — 
sia a EeN;, 0c0, ne (315) 


From these it follows that 


log pile, 0) = DO) + VE), te Ni, 060, ‘= 1, 2, “e N, (3.16) 


b and yf? being continuously differentiable, by virtue of the difr- 


vrith the functions 
This completes the proof of necessity for r = 0. 


erentiability properties of the pi. 
0. By are-indexing of the components ty, te, ..., ty if necessary, 


Consider r > 
we have that there exist r points of 0, sayO), 6), ...,0, and integers j,, jo, ..., J, 


such that 
| (@lose 0 PRLS, gt 
| (agë ) o% a Oe ts a dikë ga (3.17) 
while | 
Ee) (eee) | 
EA 20, oc 02,00; | xo, 6 
i, k = 1, 2, o f : | 
3? log p 
A == 0 Ë 
( 02,00; AP (3.18) 
2 log p ð? log p 
A log p , fëpse 
( BEA Ne on p ( 02%, 90; e oP ( 9x,,00, da G) | 
for all 0c0, every j= 1l, 2, ..., V, and every m = r41, r+ 2, n. 
e tween r+1 and n, inclusive. Again by virtue 


e . z be 
Consider any particular fixed integer m 
aj ave that the determinant (3.18) vanishes identically in 0 and j 


of Lemma 2.3, we h Ë i 
not only at ao but for all w in some neighbourhood of 2°. (Since the mixed partial 
Pë can of p are continuous we can furthermore choose this neighbourhood so small 

ro throughout the neighbourhood, not just at °.) 


that the determinant (3.17) is nonzero © pour : 
We may take this neighbou rhood to bea direct product of a 1-dimensional neighbourhood, 


s . 0 0 að 0 
i neighbourhood of (is “0-5 ®m-1) & sangha)’ 
N n of ae, and an (n— 1)-dimensional 8 pe dt mtb vy ti). On 
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doing this, and restricting attention to only the points (a9, ..., 2.1, É, 2 it, ..., 29) 
for E e Nm, we get, in place of (3.18), after utilizing (3.9), 


(= log Pilti, 0') ) ( 0” log py (x; + 0) ) 
TT Or, 90 ja z, oft 9,00; ag 
GA : 


a? log Pr (tr, 0) — 19 
( 0x,00; ja ave TË 


të log pra(E, 0") ) (2 log Palé, 0') ) 0” log Dn(E, 0) 
UZA CSA go TE0O, 
for all E E N,,, 0 e Qandj — 1,2,...,v. 
The rxr subdeterminant in the upper left-hand corner of (3.19) is precisely the deter- 


minant (3.17), and is thus nonvanishing. If we expand (3.19) 
of the last row, we obtain equations of the form 


in terms of cofactors 


0500; 
EeN,, 06€0, p= es 


g log Palé, 0) , £- 0 log Palé, 0') 
0—SIM IL —Sime”) = :2 
+È Bai) ( I, 4 se Se 


ax 


where 6 and by are the cofactors in question, and bin particular is a nonvanishing con- 
stant. On dividing (3.20) by 8 and setting 


HAE) = (Zoe za (E, 0) 2) 


AS hr mari, EBM, us (8.21) 
FA oN 
i 0 log py (E,0) a & 
we may write E 08 Pm (5,0) 9 Tym) m 
REJ që 2 OOE), se 8989) 


EeNy, 060, j—1,2,...,v. 


Integration of these equations with respect to £ ia immedinte, and wo get 


ô lo; 0 f 
SEPA) ypo +E by OWME), EN, 060, j=1,2, v, (3.28) 
; = 
Now, tracing the by, back to their definition in (3.19), wo see that if we define 
eas Pilti 0) ) ( 2 log pati, 0) ) 
01:00: al, otk Owe, po 
— ki 4 A 9 j á 
DO) = Po CO ; i : da (8.24) 
b IS 1,2,...,r (2198 pen 0) ) e 
Loe oy CS ys FR On, xs 
A= Dy 2, “TY, 
where the Aj, are any particular constants, then we have precisely 
OO) . 
(0) = 00, °3= 1,2,...y; A= 1,2,...,r. «os (8196) 
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Tt follows, therefore, that equations (3.23) imply the existence of functions b(A) 
and ySYE) such that ' 


log palë, 0) = WNO) -+YALE) + È BOE), EEN m 060. ve (3.26) 


The collection of these results for m = r+1, r+2, ..., is exactly (3.5), and the 5, 
à = 1, 2,...,7, here are, as asserted, independent of m. It is clear from (3.21) aaa 
(3.24) that the b, and the yf”, A= 1, 2, ..., 7% are continuously differentiable. This 
fact together with the differentiability properties of the function on the left of (3.26) 
implies that also b and yf” are continuously differentiable. 

Tt remains to establish assertions (ii) and (iii) of the theorem. To prove 
(ii), let us take the integers jo Ja +++» Jr and the points 00, 09, ..., OV that enter 
into the determinant 5, which is the 7x7 subdeterminant in the upper left corner 
of (3.19). Using (3.24) directly for our evaluations, we readily find the following 


` result : 


(FR), = ty ASIM, ws (8.27) 


where 5), is the Kronecker delta. Thus, for these j, and 0, the matrix 


oj 


1, there is no constant linear relation among 


(3.28) 


Therefore, by Lemma 3. 
„b, and this is what was to be shown. 


e (iii), we have only to expand the determinant in (3.24) in terms 
And we have already seen above that the nonsingu- 


is nonsingular. 
the functions by, b» + 

Finally, to prov 
of cofactors of the last column. 


larity of the matrix IAnill in (3.6) is a consequence of (ii). 
We have now completed our demonstration of the necessity of the conditions 


(i), (ii) and (iii) in the theorem. We consider finally the proof of sufficiency of these 

conditions, which can now be given with dispatch. 
Suppose conditions (i), (ii) and (iii) are satisfied for some r < s. It has already 
Therefore, by Theorem 2.2, there is a sufficient 


a 0 
been shown that then "= pet) X ; E 4 x 
statistic for % about 2° which is Euclidean of dimension 7 and continuously diffor- 


entiable about at. Ifr=% the proof is hereby complete. If r < s then we simply 
adjoin any s—r continuously differentiable, real-valued functions to the sufficient 


statistic just found, and this provides a local sufficient statistic of dimension s, as 
asserted. 
Theorem 3.1 is finished. 

taken by Koopman in his theorems is not, as in our theorem 


above, to consider the question of existence exclusively. It is rather to take a given 

, i 48 : 

statistic of dimension § and to ask after conditions that it be a sufficient statistic. 

Our next theorem is cast in these terms, and is the direct local generalization of 
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Koopman’s results to the case of nonidentical factor densities. (It will be noticed that 
we assume the statistic T to be locally continuously differentiable, while Koopman 
assumes merely continuity. What accounts for this is that Koopman’s definition of 
a sufficient statistic is more stringent than the currently accepted definition, an 
equivalent form of which is given in Lemma 2.1. Thus, he has been able to argue on 
the weaker hypothesis of continuity of a sufficient statistic, whereas we have required 
continuous differentiability in order to achieve the fundamental Lemma 2.2, which 


enables the analysis to proceed from the weaker, modern definition of a sufficient 
statistic.) 


Theorem 3.2: Let Xy be the family of distributions defined in (1.2), and let 
x° = (af, 28, ..., 29) be a regular point of Q for the family Py. Let T be a statistic of 
Pr which is Euclidean of dimension s < n at x° and continuously differentiable about 
ee 

A necessary and sufficient condition that T be sufficient for Fq about x is that for 
some integer r < s there be n—r of the factor densities, say p, 419 Pran -+s Pns for which 
the properties (i), (ii) and (ili) of Theorem 3.1 hold, and furthermore that if r > 0 and if 
Ís jo “jr and QO, 92, “OP are some particular chosen sets of integers and points 
of © such that 

(Flos pi, 0) ) 
9,065, zo E 3 Linar 
(—and these will exist—), then the statistics 


$ (Plog p(%;, 0) N 
SN S12, pt we (3.30) 
Pat 065, ton ( 


are functions of T almost everywhere (Lebesgue) in some neighbourhood of ao, 
Of the functions 


(3.29) 


PE = (ERE) ËS, 2,...,n: k— 1,2,...,7, 2. (8.31) 
Oi, gë) 

that enter into (3.30), those with index i > r+1 are precisely the functions YO A= 
1,2,..., m—r-tl, +2, ..., n, that figure in one of the possible sets of local 
representations (3.5) of the p,(£, 0), m = r+1, r-+2, away the 

Proof: We first prove necessity. If T is a sufficient statistic for Xy about 
2°, then by Theorem 3.1 there is an integer r < s such that, with suitable re-index- 
ing of the p; if necessary, the factor densities Prit: Prip +++) Pn have the properties 
(i), (ii) and (iii) of that theorem, Suppose r > 0. According to Theorem 3.1, r is 
precisely p(a9), and in the necessity proof in that theorem we have seen that we may 
select integers Jas Jas nj, and points 00, 92, A” of QO such that (3.29) holds, 
and then a set of representations (3.5) is determined, via (3.21) and (3.24), with these 


ja and 6, Tet some particular such collection of ją and 0 be chosen. By 
Theorem 2.3, the functions 


mie) = (ZE ate, 0) gë 


264, k— 1,2,...,r, we (3.32) 
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constitute a sufficient statistic for Xy about x°, which is Euclidean of minimal dimen- 
sion r at x° and continuously differentiable about 2°. By (3.9) we have 


— $ / 2log p(x, 0) SË 
mbe) = ( a8 ) ar (BIDS ae Fe ne (3:38) 


Theorem 2.5 asserts that (7, 12 ..., 9+) is locally almost everywhere (Lebesgue) func- 
tionally minimal ata”. Hence, it is a function of T almost everywhere in some neigh- 
bourhood of 2°. This statement is, as we see through (3.33), the asserted result that the 
functions (3.30) are functions of T almost everywhere about 2°. 


Finally, (3.21) in the necessity proof of Theorem 3.1 immediately establishes 
the last statement of our present theorem. And with this the proof of necessity is 
complete. 

We must now prove the sufficiency of the conditions of our theorem. Having 
(i), (ii) and (iii) of Theorem 3.1 satisfied, we have, by that theorem that r = pla”), 
and furthermore, from the details in the proof of this fact we know that there exist 
juja j, and OM, 0, ns 6”) such that (3.29) holds. It then follows that the 7, 
in (3.33), which is to say the functions (3.30), constitute a sufficient statistic for Wy 
about x°. It is then a consequence of Definition 2.2 that since these 7, are functions 
of T almost everywhere in some neighbourhood of x°, T is itself a sufficient statistic for 


My about 2°. 

We have completed the proof of Theorem 3.2. 

We may now give two corollaries, one of each of the above two theorems, 
which specialize the results to the classical case of identical factor densities. 
Let Pnr be the family of distributions defined in (1.2), wherein, 
in particular, the pj t= 1, 2, =. n, ae identically the same function, say po. Let Qo 
be a standard copy of the identical open sets Qy, Qe, ..., Q, (see the Introduction), so that 
P(E, 0) is defined on QoXO. Let 2° = (29, a8, ..., 08) be a regular point of Q for the 
family Pr 

A necessary and sufficient condition that there exists a sufficient statistic for Jy, 
about 2° which is Fuclidean of dimension s < n at x and continuously differentiable 

E that for some integer T < s there be n—r of the components of x, say 


with the following properties : 


Corollary 3.1 : 


about x? is 
o 0 0 
War, Vo ra os Un 


(i) there is an open set A G Qg which contains the points nqa Ky E 
and such that po is of exponential type ind: 
r 

log palë, 9) = bO) HPEH E BOJË), Ee A, 060, ... (3.34) 


earing in this expression being continuously differentiable ; 


with all functions app 
nstant linear relation among the r functions b,, A = 1, sË, 


(ii) there is no co TT i 
in (3.34): that is the expression on the right in (3.34) is in reduced form; 
è ; 5 
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(iii) the functions by, A=1,2,...,7 are constant linear combinations of the 


logarithmic derivative of py evaluated at each of the remaining r coordinate points, 
x, 19, ..., 2°, of the x°, thus : 


(0) = Prot E ty (EREA) ASL Ban 885) 


E =x 
where the matrix ||P,;\| is nonsingular. 


For r = 0 these conditions are to be understood as follows : (3.35) is vacuous 
and the r-term sum on the right-hand side of (3.34) is 0. 


The integer r for which the above properties are verified is unique and is precisely 


the local minimal dimension of a continuously differentiable, Huclidean sufficient statistic 
for Pair about x. 


Proof: It is necessary only to prove that the three conditions stated here 
above are equivalent to the three conditions in the statement of Theorem 3.1 when 
the p; are all identically equal to py. That the present conditions imply those of 
Theorem 3.1 is immediate; in fact, for each m = r+1, r+2, ..., n, the set A itself may 
be taken to be the neighbourhood of 2°, for which (3.5) is to hold, and the n—r equa- 


lities (3.5) may be taken all identical with (3.34). As for the remaining conditions, 
the correspondence is clear. 


Suppose, conversely, that we have the conditions of Theorem 3.1 verified, 
with every p; equal to Po: Then, clearly, the statements there about the n—r factor 
densities prir, Pra <- Pa translate immediately into statements concerning the single 
density py and its behaviour about the n—r points a7, Uho,- 49 of Qg. We must 
show that the several local forms (3.5), valid for respective neighbourhoods Nm say, 
of the af, m = r+1,7r+2,...,n, can be so designated that they constitute a single 
statement of the form (3.34). This will be so if the designations can be made so that, 
for each A = 0, 1,..., r, the two functions Y™ and ym” agree on N,,()\Nin, for every 


pair m, m’, and so that DO) is independent of m. For, clearly, if these properties are 


secured then (3.5) goes over into the single statement (3.34) with 4 = U Nja: 
merri 
Now, in the proof of Theorem 3.1 we have seen that the Y, for A = 1, 2,...,7 and all 
m=r-+1,7r-+2, ..., n, may be taken as defined by (3.21). In our present situation of 
Pi = Po for all i, this definition entails immediately that these y which are defined 
over all of Qo, do not depend on m; hence, 
Nie (Vëre Furthermore, on such an inters 


(3.5), namely, 


they will indeed agree on intersections 
ection the two pertinent equalities of 


log palë, 0) = 1910) + VE) È DOE), 


(3.36) 
log pol, 0) = 250) + VEE) + È BOWE), 
then yield BOJ tye) = (0) +- WE), LEN, M Nv, 060. pe (89877) 
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From these relations for all pairs m and nr it follows that the functions b™ on 0 diffe 
by at most additive constants, and likewise the functions Vv on ateneo of e 
ia Specifically, there are constants Qiy Gyj2,...,4, such that Bo (0) £ 
O80) +Gn, m = r+1, r+2, ..., n; and these constants are furthermore such th tth 
functions y¢®(č)-+a,„ agree on intersections of their domains. Hence, if. ae a 
m, the functions b{” and y{” are replaced by f"—ap, and ya, “Sepa: 
then we have designations of the first two terms on the right-hand side of (3.5) fe 
that the first does not depend on m(b$”—a, = 0%), and the second terms ie dif- 
ferent m, agree on intersections of their domains. This achieved, it ae follows. 
as indicated above, that the equalities (3.5) combine into the single statement (3.34) 


T Km by defining dy = bf? and Yo = Yam on Nm, m= r+, r--2, 


vy n, and yy = YA" for all m > 7+1 and all A> 0. 


The proof of the corollary is complete. 


Corollary 3.2: Let Pn; be the family of distributions defined in (1.2), wherein, 
in particular, the pj, i = 1, 2, ....%, are identically the same function, say py. Let (9), 
be as in the statement of Corollary 3.1, and let a9 = (xë, 22, ..., 49) be a regular point of 
Q for Pur Let T be a statistic of Pur which is Euclidean of dimension s < n at x° 


and continuously different about 2). 

A necessary and sufficient condition that T be sufficient for Xur about x° is that 
for some integer r K s there be n—r of the components of a”, say ad .1, a,~., 19, for which 
the properties (i), (ii) and (iii) of Corollary 3.1 hold, and furthermore that, if r > 0 and if 
Jas Jes -<-s Jp and OM, 0®, ..., OM are some particular chosen set of integers and points 


of © such that 


03 “e (8.38) 


4b 2,7 


( 0” log pol, 9) 
QË0Oi Tao gë 


(-and these will exist—), then the statistics 


$ (26 palë, 0) 


Pea TË ay : 
ion ie ue r, ve (3.39) 


il 
are functions of T almost everywhere (Lebesgue) in some neighbourhood of x. 


The functions 
— (9 tog Polé, 0) peas E 
stë JËS ee oe eam 


that enter into (3.39) are precisely the functions Yh AT 1, 2,..., 1, that figure in one of 
the possible representations (3.34) of po for some open subset A of Qy containing the points 
dh qo QY 8 et de 
Proof : 
the statements of that theo 


This corollary obtains immediately from Theorem 3.2 by specializing 
rem to Wur 
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It is worth pointing out explicitly that the more complicated statement kove 
involving (3.39) and (3.40) cannot be replaced, as might first be suspected, by the simpler 
statement that the statistics 


SPE IA, ce. E, ve (3.41) 
i 
where the vr, here are those appearing in (3.34), are functions of T' almost everywhere 
in some neighbourhood of x°. The reason for this is that while the functions defined 
by (3.40) over all of Q, will provide a representation (3.34) for a subset A of Qg, the 
functions y, as given by (3.34) are defined only over A, and even if they are defined 
over all of Qg, their definition over Q,—A may be completely irrelevant to present 


questions, so that the first r terms in each of the sums (3.41) may be arbitrary 
and meaningless. 


4, ANALYTIC DENSITY FUNCTIONS 


Preparatory to obtaining, in Section 5, what global results can be stated at 
ended level of generality in the case of analytic product densities, we now 
withdraw for a moment from our concentration on product densities to consider ana- 
lytic density functions generally, be they of the product type or not. 
a family 77 = {xy 0 € O} of probability measures on the Lebesgue subsets of an open 
set Q in Eg, a Euclidean n-space. The index set @ is an open set in a Euclidean v- 
space, Hf. Each y is absolutely continuous with respect to Lebesgue measure, and 
ve suppose there is a determination, p, of the family density function such that 
Pz, 0) > 0 throughout OXO, and such that for a fixed open, connected set 
Y CQ, pis analytic in Q’x@. The set Q’ might, in particular, be all of OF 
but we need not insist on this for our present deliberations. 


our int 


Thus, we consider 


We shall avail ourselves of the following standard property of analytic functions: 


Lemma 4.1: If two real-valued functions are analytic in an open connected 


subset Y of a Euclidean space, and they are equal at all points of a (nonempty) open 


subset of Y, then they are equal everywhere in Y. 


The statement and proof of this fact may be found on p. 202 of Dieudonné 
(1960). 


Consider a regular point, x°, in Q’, and for brevity set p(x?) = ro Suppose 
first that ry > 0. Then, with a re-indexing of the x; if necessary, 


Jas Jas +++),Jro between 1 and v inclusive, and points 0), 0, 
matrix 


there exist integers 
“00 of ®© such that the 


A(z) = 


( 0? log p ) | 
On; Oig a, 9) t,k = 1, Dë ems To 
is nonsingular at x°. Now, det A(x) is analytic in Q’, 
in an open subset of Q’; for, if it did, then by Lemma 4.1 
in Q’, and in particular it would vanish at a 
A(a9) is nonsingular. Thus, the points at which 


(4.1) 


It can, therefore, not vanish 
it would vanish everywhere 
contradicting the established fact that 
A is nonsingular are dense in Q’, and it 
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follows, by the definition of the function p, that p(x) > rọ for all ve Q’. If wenow 
consider that our choice of x° is such that rọ is the largest value that p takes on at a 
regular point in Q’, and if we take account of the fact that the value of p at a nonregular 
point does not exceed the value of p at a suitable, arbitrarily nearby regular point, 
then we obtain our ultimate conclusion : the function p is constant on Q’. If rọ = 0, 
then the assumption that p(x!) > 0 for some regular point 21 in Q’ leads, by the above 
argument, to the conclusion that p has a constant positive value everywhere in 1’, 
thus contradicting the fact that p(x?) = r9 = 0. Hence p vanishes at all regular points 
of Q’, and so it vanishes everywhere in Q’. The conclusion that p is constant on Q’ 
therefore holds in all cases. Thus, the local least possible dimension of a continuously: 
differentiable, Euclidean 77-sufficient statistic is the same at all points of Q’ (see Theo- 
rems 2.1 and 2.2). 

Let us continue to denote by ry the constant value of p on Q'. Again suppose 
first that 7) > 0. Let a” be any particular regular point of Q’. Consider the matrix 
A(a) of (4.1) constructed with reference to the point x°; that is, so that A(x?) is non- 
singular. Then, by Theorem 2.3, the 7) functions 

m (2) = (pere pe plg, E k— 1,8, 45% ne (4.2) 
constitute a sufficient statistic for # about x°. Now, in the argumentation above 
at there is a dense (open) set of regular points in Q’ at each of which 
Hence, the same set of functions (4.2) constitute (among all conti- 
nuously differentiable, Euclidean statistics) a minimal Së Euclidean suff- 
cient statistic about each of a dense subset of regular points of Q’. ) 

TË rọ = 0, then a constant function supplies such a statistic, and it is, in fact, 
of minimal dimension at every regular point ofQ'. Moreover, in this case every point 
of Q' is a regular point. This is a ready consequence of the nature of the function p. 

Let Ty = (o Yar 5 gro), Supposing that ro > 0. The E presents 
itself immediately whether or not Ty is in fact sufficient ig in Q'. According to 
if æ is any particular one of a certain dense subset of regular 
and g! (with restricted domains of definition) such 


we have seen th 
A is nonsingular. 


our conclusions thus far, 
points of Q’, there exist functions f' 
that the relation ple, 0) = FUT oe), 0)g}(2) a 
is valid for all 0e 0 and all x in some neighbourhood N' of ai: pal additionally by virtue 
Le he functions fi and gi may be taken to have differentiability properties. 
va BA GC To asa sufficient statistic for 77 about x. The question that 
has isa raised is whether or not in fact there exist eee f and g, defined for all 
Q! and O, so that (4.3)—with f and g in place of f' and g', E aoe for all 
Qe0O and everywhere in Q’ or at least almost everywhere (bebes ue) Ha Q' for each eO 
(see Definition 2.1). It appears that the answer to this question is not always in the 
affirmative. We must content ourselves with asserting less than this in the general case. 
Consider (4.3) 

tions (4.2)—that is, to the vector 


in particular for the regular point 2° which led us to the func- 
function Tẹ. For this point let f° and g° be the 
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pertinent functions in (4.3), and let N° be the pertinent neighbourhood of x°. Then we 
have, about x°, for all 0¢0, 


ô log p(x, 0) _ 3 log f(T), 0) 
28, 28, : 


MAL, 2 gë Vi mi (4.4) 
i 


From these equations we conclude that the partial derivatives 3 are: 

i e 
analytic functions on N°x @, and it follows that fP(T (2), 0) may be taken analytic 
in N°x@. Then, from the relation 


log p(x, 0) = log (YT ol), 0)+ log g(x), ve (4.5) 
it follows that g” is analytic in No. 


ð log (UT), 0) 
a 


Now suppose f° has a nonvanishing extension f over 7er 1o X O’—where Fery 
denotes the range T', restricted to Q'—such that f(T (x), 0) is analytic in Q'XO. Then 


the analytic derivatives ee; are identical with the right-hand sides of (4.4) 


in N°, and are consequently equal to the left-hand sides of (4.4) in NOXO. Since 
the left-hand sides of (4.4) and the derivatives PEAT ole), 0) 
it follows by Lemma 4.1, that 7 


2 log (Tete), 6) _ 0 log pe, 0), J = 1, 2,..., vi(x, e Q'XO. ... (4.6) 
a, 70, 


are all analytic in Q’ X O, 


Note that © need not be connected, and therefore Q' x @ may not be connected. But 
in this cage (4.6) is established by reasoning separately with each connected compo- 
nent of ©. From (4.6) it follows that 

p(x, 0) = AT da), Date), (x, A)e V xO si (47) 
for some function g, and as reasoned above, the function g is analytic in Q’. 

Thus, we have shown that if f° has a nonvanishing extension f over 
Ferja KO such that f(T (x), 0) is analytic in Q' x @, then Ty is sufficient for 7 in Q’, 
and moreover the factorization (4.7) is in terms of analytic functions. 

In the case 7) = 0, the function T as given by (4.2) is to be replaced by Tọ = a 
fixed numerical constant. And in this case the argumentation proceeding from 
(4.4) is still valid, so that the statistic Ty = a fixed numerical constant is sufficient in 
Q! if, for some analytic factorization (4.13), f° has a nonvanishing extension f over 
Fryx KO such that f(T), 0) is analytic over Q'XO. But in this case there is 
such an extension immediately evident : since 7’, is constant, with value a, say, defined 
on N°x @, f(T (x), 0) is an analytic function of 0 alone, throughout @; therefore, define 
F(T (x), 0) = f(a, 0) for all (x, Ole Q'XO. The function fis the desired extension. 
And so, in the case of To = 0, a statistic of the form 7, = a fixed numerical constant is 
sufficient for 77 in Q. 

We now sum up the results of this section in the following theorem. 


i Theorem 4.1: Let p be a family density function as described at the beginning 
of this section and, as hypothesized, let Q' be an open connected subset of Q such that p 
is analytic in QXO. Then the following facts hold : 


(i) The function p is constant over Q'. 
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; (ii) If ro is the.constant value of p in Q', and if x? is any particular regular 
point of Q', and if To = (M Ta: ---» Nro) is the analytic (over Q') function given by (4.2) 
in case ry > 0, and Ty = a fixed numerical constant in case ry =0, then To is, among all 
continuously differentiable, Euclidean statistics, a minimal dimensional, Euclidean 
sufficient statistic for 72 about each point of a dense subset R, of regular points of Q’. 
In particular, Ry = Q' in case t= 0. 


(iii) p has an analytic factorization in terms of Ty about x° (and equally well, 

about each point of Ro). That is, for some neighbourhood N° of x°, 
p(x, 0) = f(T(x), g(x), (e, 8) e N°x®, wo (418) 

where FYT (e), 0) is analytic over NIXO and g is analytic over N°. 


is a sufficient statistic for 79 in Q! if, for some analytic factorization 


(iv) To 
ishing extension f over Feja XO 


(4.8), the function f° on Fero XO has a nonvan 
such that f(T), 9) is analytic over Q' XO. And then we have 
ple, 0) = KTote), Date), (e, 0) € 2’ xO a (49) 


cover Q. In particular, in the case ry = 0 there is suchan extension, 


with g also analyti 
= a fixed numerical constant is, in this case, sufficient 


so that a statistic of the form To 
for P in X. 
5, THE GLOBAL RESULTS FOR A (PARTIALLY OR FULLY) 
ANALYTIC FAMILY OF TYPE Zn 
the local theorems of Section 3 with the general 
Section 4, to obtain the desired global results. 
family Xr as given by (1.2), and such that, furthermore, for 
pen, connected subset (i.e., an open interval) Q; of 
is analytic in Q;x ©. Then Q' = QIXQSX... XQ, is an open, 
analytic in Q'XO. It may be that each Q; is itself 
Q; x © for each i, in this case we may take Q; = Q; 
Or, it may be that some or all of the Q, are not 
connected, but p; is for each i, our results below can be applied in this 
case through separate consideration of the several connected componenti of the Q; 
Our investigations in the present section will be concerned exclusively with the subset 
Q'YO of 9x0 ; and outside this subset p may or may not be analytic. That is, in 
e directing our attention to a part of the domain of p, which may be a 
is analytic while it may not be analytic in the complemen- 
Thus, the family #n MAY be only partially analytic, and the 
“global” is to just the analytic part of the domain. On the 
fully analytic—that is, p may be analytic over all of QXO 
lied, as we have just remarked, for each 


ated results can be app 
arately and “global” then can be warrantedly under- 


We are now ready to combine 


consequences of analyticity as seen m 


We consider a product 
each i = 1, 2,..., there is an 0 
Q; such that p; 
connected subset of Q, and p is 
an interval and that p; is analytic in 
d therefore 2’ = Q. 


for each i, an 
analytic in Q; 


general, we ar 
proper subset, on which p 
tary part of the domain. 
pertinence of our term 
other hand, jën MAY be 
—and in this case OUT sti 
connected component of Q sep 
stood to refer to the full domain of p. 
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The result nearest at hand is the following : 


Theorem 5.1: Let Wy be a product family of distributions as defined by 

, + Të + 

(1.2), with the additional property that p is analytic in Q' x © where Q'— QIN QESË.. KO 
is an n-dimensional open interval and the Q,XO are se 


ts of analyticity of the respective 
factors of p, as described above. 


A necessary condition that there exists a statistic which is sufficient for Pq in 
Q! and is Euclidean of dimension s < nin Q', and- which is continuously differentiable 
about some regular point of Q', is that for some integer r < s there be n—r of the factor 
densities, SAY Prits Prios ++; Dy, with the following three properties : 

(i) the Pm m= r+1,r+2,. 


++”, are of exponential type in the respective 
domains Q, x, as follows : 


log palë, 0) = BENOE) Bb, OME, “i (5.1) 


(E, 8) eQ xO; m= r+1,7r+2, ..., 7, 
with all functions appearing in these expressions being analytic in the pertinent domain 
Q,, xO (or, equivalently, in Qn or in O, respectively) 


(ii) there is no constant linear rela 
tions by, A = 1, 2 


tion among the + common parametric func- 
+» 7, that enter into the n—r expressions (5.1); that is, every one of 
these expressions is in reduced form ; 

(iii) for suitable representations 
tions by, A = 1, 2, ..., r are constant 
evaluated at some regular point x° = 


(5.1) of the Pm, m = r41, 7r+2,..., n, the func- 
linear combinations of the logarithmic derivatives, 


(22, 28, ..., 40) of the remaining r factor densities, 
thus : 
r al . 
(0) Bot E As (212 pd.) E, 


ee) 
vj = 20 
wherein the matrix llAnll is nonsingular. : 


For r = 0 these conditions are to be understood as 
rterm sum on the right-hand side of (5.1) is 0. 

The integer r for which the above pre 
the constant value of the function pin Œ. 

Proof: Suppose there is a statistic which is sufficient for Xy 
Euclidean of dimension s <n in Q’, and which 
some regular point of Q’, say 2° = 
sufficient for Fi about 2°, 
re-indexing if necessary) the 


follows : (5.2) is vacuous and the 


operties are verified is unique and is precisely 


in Q' and is 
is continuously differentiable about 
(xf, 28, ..., 29). Then, a fortiori, this statistic is 
Consequently, Theorem 3.1 applies and (with suitable 
Pm, M = r+1, r+-2, ..., n, have the local exponential form 
(3.5) described by (i), (ii) and (iii) of that theorem. By virtue of the present 
analyticity assumptions we see by (3.21) and (3.24) that the functions b, and y™ 
in (3.5) are analytic in, respectively, © and the pertinent Qn. Tt follows that also 
the of”) and yo” are analytic in O and Qw respectively. Thus, the equations (3.5), 
considered in turn for each open, connected component of O, state the equality, in an 
ver an open, connected set. Therefore, by 
ntire open, connected set. Aggregating these 
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results for all the components of ©, we obtain the assertion (i) of the present theorem, 
giving the exponential form of p,, over all of Q, XO, for each m = r-+1, r+ Dani 

Assertions (ii) and (iii) of the present theorem are immediate consequences 
of (ii) and (iii) of Theorem 3.1. 

Consider finally the last assertion of the present theorem. Conditions (i), 
(ii) and (iii), applied locally about the regular point v? in question, are precisely the 
conditions (i), (ii) and (iii) of Theorem 3.1. Therefore, by the last statement of that 
theorem, we have necessarily r = p(a9). But by Theorem 4.1, p is constant over Q’. 
Hence, r is, as asserted, the constant value of p over Q’. 

This completes the proof of Theorem 5.1. 

The next two theorems give circumstances under which the three conditions 
of Theorem 5.1 are sufficient as well as necessary. 

Theorem 5.2: Let My be a product family of distributions as described in the 
hypothesis of Theorem 5.1. Let conditions (i), (ti) od (iii) of that koren hold for some 
r < s < n and for some regular point x° = (9,28, aay Ifr si 0,a statistic of the form 
T—a fixed numerical constant is sufficient for Pa naQ 2 and it is-analytic and of dimer 
sion r = 0 over Q'. Therefore, any statistic which is Budlikun of Ge s>r=0in 
Q! and is continuously differentiable about some regular point of Q' is sufficient for My in Q'. 

If +> 0, then for suitable integers jı Jas -jr and points OM, 09, ..., gw”) 


of ©, the statistic Ty = (Nas Da» ++» In)» where 


z, 7 A log Pili 0) b=1,2,.:.,7, «=. (6:3) 
Mle) =3( is E, 


is sufficient for Py in some neighbourhood N° of x° and there are functions f ° and g° such 
ia po, 0) = Prot), Oe), (2, De Nx O n 64) 
$ ; i N°xO. The statistic Ty is analytic 
OT (x), 0) and gota) are analytic oer Rees! ae 
sie e me hee os (which are all analytic over the respective Q's) 
jan i 
A106 p69) ) $= 152,00) B= 1,2,...,7, e (6.6) 
vP (6) = (“aig he 
j hose with index i > r41 are precisely the functions yA, A = 1, 2, 
ne ES : i n, that figure in one of the possible sets of representations (5.1) 
tj m = r4 l, THY, or © 
of the Pm M = r+1, 72, oo Ne 
Now, if the function f’, de 
over ferçja: X © such that AT ote),0) #8 


defined on Fero) go XO, has a nonvanishing extension f 
analytic over Q' XO, then T—analytic and of dimen- 
H vent statistic for Pn in Q'. And it follows, a fortiori, that 
my e ins for Wa in O, is Euclidean of dimension s > rin 
Q ; inuously zaffrëndiahte about some regular point of Q'. 
, and is continuously won hypothesis, if r = 0, then the very last statement of 
Proof : Under the a — a fixed numerical constant is a sufficient statistic for 
Theorem 4.1 gives us ane which a sufficient statistic is (almost everywhere) 
Min Q’. Since I aunts and since Tọ = a constant is a function of any function 
a function is likewise 
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on Q”, it follows in particular, as asserted, that any statistic which is Sc eae 
; i pj po 
dimension s > 7 = 0 in Q’ and is continuously differentiable about some regular poin 
Fr 
of Q' is sufficient for y in Q’. 


Ifr > 0, then (5.3) and (5.4), and their attendant remarks, follow from Thegrem 
4.1. And the assertion regarding (5.5) is a consequence of Theorem 3.2. That To 
is sufficient for Xy in Q’ if f° has the indicated extension fis an immediate consequence 
of (iv) of Theorem 4.1. And finally, the very last statement of the present theorem 
is established by augmenting T, with any s—r continuously differentiable compo- 
nents (if s < r). 

This completes the proof of Theorem 5.2. 

Next, we have sufficient conditions as follows : 


Theorem 5.3: Let Xy be a product family of distributions as described in the 
hypothesis of Theorem 5.1. In addition, let the nonregular points of Q' constitute a set 
of Lebesgue measure 0. 


Then, conditions (i), (ii) and (iii) of Theorem 5.1, for some r < s, are sufficient 
that there exists a statistic which is sufficient for Wy in Q! and is Euclidean of dimension 
8 <n in Q', and which is continuously differentiable about some regular point of Q'. 

In fact, in this case, there exists a statistic which is sufficient for Py in Q' and 
is Huclidean of dimension r in Q', and which is re 


gular at and analytic about almost every 
point of Q'. 


Proof: According to the last statement in Theorem 5.1, we have r = p(x) 
for every se Q’, and in particular for every -regular point eQ’, According to 
Theorem 2.4 there is a *Py-sufficient statistic, 7*, which, for almost all (Lebesgue) 
regular points x° of Q is Euclidean of dimension p(z?) at æ? 
statistic is therefore, a fortiori, sufficient: for My, in Q' 
almost all regular points of Q' 


, and regular at a9. This 
, is Euclidean of dimension 7 at 
and is regular at each of these regular points. More- 
over, T™* is analytic in a neighbourhood of each of these points, as we see by its cons- 
truction in Barankin and Katz (1959) from elements of the form of Ty in Section 4 
above, these elements being clearly analytic. Since, under our present hypothesis, the 
onregular points of Q' form a set of Lebesgue measure 0, it follows that 7 is, in fact, 
Euclidean of dimension r at almost all points of Q’ and regular at and analytic about 
each of these points. On the Lebesgue null set in Q’ for which this is not true, TË 
may be altered, if necessary, to have dimension 7, and thus the resulting statistic is, 


as asserted, sufficient for # in Q', Euclidean of dimension r in Q', and regular at 
and analytic about almost e 


very point of Q’. 
If r< s, the statistic just discerned may be augmented by any s—r continu- 
ously differentiable com 


; ponents, and thereby we establish, a fortiori, the weaker 
assertion of the present theorem; namely, that there exists astatistic which is sufficient 
for My in Q’, is Euclidea 


n of dimension s < nin Q’, and is continuously differentiable 
about some regular point of Q. 


Theorem 5.3 is therefore proved, 
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The above three theorems are the global existence theorems that generalize, 
to the case of nonidentical factor densities, the results of Fisher, Darmois, Koopman 
and Pitman. We give now the global theorems that generalize, in particular, the 
Koopman form of results, wherein a specific statistic T is considered. Thus, the next 
two theorems, as a pair, stand, at the global level, in the same relation to the pair of 
Theorems 5.1 and 5.2 as Theorem 3.2 stands to Theorem 3.1 at the local level. Of 
course, Theorem 5.3 yields a sufficient condition for the sufficiency of a given 7 just 
as well as Theorem 5.2 does, and the form and proof of this result will be quite evident 
from our ensuing discussion. (The crucial additional requirement is simply that 
TË be almost everywhere a function of T.) However, we shall not here set down expli- 
citly this implication of Theorem 5.3, because it is the sufficient condition drawn 
rather from Theorem 5.2 that is the natural companion of the necessary condition that 
derives from Theorem 5.1. This will be clear on the face of it when we have set down 
our next two theorems; and it will be abundantly verified with the subsequent specia- 
lization to the case of identical factor densities. 

Theorem 5.4: Let Xy be a product family of distributions as described in the 
hypothesis of Theorem 5.1. And let T be a statistic which is Euclidean of dimension 
s <n in Q', and is continuously differentiable about some regular point, x° = fest 
<r) QË QE 

A necessary condition that T be sufficient for Py in Q! is that, for some r < s 
and for the regular point x°, the assertions (i), (ii) and (iii) of Theorem 5.1 hold. And 
furthermore (it is necessary that), if Ty denotes the statistic defined in the statement of Theo- 
rem 5.2, relative to the regular point x9, then To is almost everywhere (Lebesgue) a function 
of Tin Q. 

Proof: The first assertion of necessity is an immediate consequence of Theorem 
5.1. (The proof of that theorem shows that the regular point 2° may be pre-assigned, 
as it is here.) The assertion that Tọ is almost everywhere a function of pë in Q follows 
directly from Theorem 2.5 in case r > 0. Ifr = 0 the assertion is obvious, since Ty 


is then constant over Q’. 

The proof is complete. 

Theorem 5.5: Let My be a product family of distributions as described in the 
hypothesis of Theorem 5.1. And let T be a statistic which is Euclidean af dimension 
s< n inQ’, and is continuously differentiable about cone regular point, a9 = (a), Laj 114 29), 
of am 5 E k ; 
Suficient conditions that T be sufficient for Pa in Q' are as follows : 

(1) for some r < s and for the given regular point x9, the conditions (i), (ii) and 
(iii) of Theorem 5.1 hold; 

(2) either r = Oor, ifr > Oand Ty, N° and f° Sh the elements presented in Theorem 
5.2 (see (5.3) and (5.4)) for the regular point x°, then fo, defined on EN AO, 
has a nonvanishing extension f over Feroa XO such that (T(x), 0) is analytic over Q' xO ; 


ën pe (if r > 0) To is almost everywhere a function of Tin Q', 
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Proof: According to the first paragraph in the statement of Theorem 5.2, 
the condition (1) implies immediately that 7 is sufficient if r —0. If r> 0, itkien, 
again by Theorem 5.2, conditions (1) and (2) imply that 7, is Baa for Xy in Q’. 
Thereupon, condition (3) gives by the usual argument on the factorized form of the 
family density function, that T is sufficient. 

This completes the proof. < 


We now finally give the specializations of our results to the case of identical 
factor densities; thus, the next two results are the global counterparts of Corollaries 
3.1 and 3.2, respectively. What is to be noted in particular is that we are able to state 
necessary and sufficient conditions in the global case. And the reason for this is that 


the analytic extension condition set forth in (iv) of Theorem 4.1 is automatically ful- 
filled in the case of identical factor densities. 


Corollary 5.1: Let yr be the family of distributions defined in (1.2), wherein, 
in particular, the p; i = 1, 2, ..., n, are identically the same function, say py. Let Qo 
be a standard copy of the identical open sets Qu, Qs, ...,Qn, 80 that py(E, 0) is defined on 
O,XO. Let Qg be a subset of Q (possibly Qg itself), which is an open, connected set 
(i.e., an open interval) and is such that py is analytic in QYXO. Let Q be the n-fold 
direct product of Qy with itself ; thus, p(x, 0) = Dolti, 0) “ Polo 0): ... “Pylx,, 0) is ana- 
lytic in Q'x®, 

A necessary and sufficient condition that there exists a statistic which is sufficient 
for Par in Q! and is Euclidean of dimension $ < n in Q', and which is continuously 
differentiable about some regular point of Q', is that for some integer r < s the following 


be true : 
(i) po is of exponential type in Q)XO, as follows ; 
log pač, 0) = DOH E t 2 DNA), (E, 0) e Qx O, ya (5.6) 
with all functions appearing in this expression being analytic in Q§ xO; 
(ii) there is no constant linear relation among the r functions by, A = 1, 25...) 75 
n (5.6): 


; that is, the expression on the right in (5.6) is in reduced form; 
(iii) for a suitable representation (5.6) of po, the functions by, À = 1, 2, 1.5% 
are constant linear combinations of the logarithmic derivative of Po evaluated at (say) the 


first r coordinates, Eat ce, x}, of some regular point x? in QO’, thus: 
b,(0) = zp. (2 log palë, 0) 
MO) = Brot Z Pn (epi ) 


NET, eta CD 
wherein the matria VEN 


č=x0 4 
. é. t 
ill is nonsingular. 
For r = 0 these three 


conditions are to be wnderstood as follows : (5.7) is vacuous 
and the r-term sum on the ri 


ght-hand side of (5.6) is 0. 
The integer r for which the above properties are ver 
the constant value of the function ping’, 


Proof: The necessity of these conditions, together with the last statement 
of the theorem, follows immediatel 


y by Specializing Theorem 5.1 to identical factor 
densities. 


ified is unique and is precisely 
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To prove the sufficiency of the conditions, notice to begin with that, since 
the present conditions are those of Theorem 5.1 for identical factor Aénsitiea the 
hypothesis in the first paragraph of Theorem 5.2 is satisfied. Therefore, if pe 0. 
the sufficiency of the above three conditions is trivial. Consider, then, r > 0. Ready 
calculations on (5.6) give, in the present instance, the following more explicit form 
of (5.3), in terms of the suitable j,’s and 0s affirmed by Theorem 5.2 : 


me) =n (For) pr TË (Gai) gn ( È vq) E = ID) yee! ta, GE) 


and we have also from (5.6) : 
log ple, 0) mb) 3% geta) EnO: (È e) 69 


Now, the suitable j,’s and 0’s being what they are (see Section 4), it follows that 


| A, R=, 2)... ¢ 
is nonsingular. Therefore, the equations (5.8) may be solved for the functions $ Wy (a), 


= 
A = 1, 2, ..., r in terms of the functions 7,, k = 1, 2, ..., 7 (and this solution is valid 
On substituting these solutions into (5.9) we see that, Tọ being 


the matrix 


(Se: ) (5.10) 


over all of Q'). 
as defined in Theorem 5.2, there is a function F such that 


not E DO): (È yates) = FT, GE 'x0. a 61) 


JË vole) 
eFl(Tole), 9) = f(Ty(x), 0): e = g(x), 


then we have plz, 0) =f(Tolx), Ogle), (@, 0)eQ' xO. 
From our hypothesis we now see by (5.11) that F(T (x), 0) is analytic over Q' XO, and 
therefore the same is true of f(To(t), 9). This function f, defined over Fer,ja XO, 
is a nonvanishing extension of its restriction to 727,|No XO, this latter providing the 
f° given in Theorem 5.2. Hence, we have verified that the analytic extension condi- 
tion in the last paragraph of Theorem 5.2 holds in the present identical-factor situation. 
Consequently we have forthwith, by the last statement of Theorem 5.2, that there exists 
a statistic which is sufficient for nr in Q’, is Euclidean of dimension s < nin Q' and 
is continuously differentiable about some regular point of Q’. 

This concludes the proof that our conditions are, as asserted, sufficient; and 
therewith the corollary. 

Corollary 5.2: Let Xa be an identical-factor, product family of distributions 
in the hypothesis of Corollary 5.1. And let T be a statistic which is Kuclidean 
vin Q! and continuously differentiable about some regular point, ao, 
n denote the statistic defined in Theorem 5.2. 
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And if we set 
(5.13) 


as described 
of dimension 8 < 1 
of Q'. Let Ty agai 
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A necessary and sufficient condition that T be a Sufficient statistic for Pqz in Q! 
is that the following be true : 
(1) the conditions (i), (ii) and (iii) of Corollary 5.1 hold, and 


(2) To is almost everywhere a function of Tin Q'. Or, equivalently, each of 
the functions (see (5.6)) 


È Va), A=1,2,...,7 as (5.14) 
i=1 
is almost everywhere a function of T in X. 


Proof: The equivalence noted in the last statement here is a consequence 
of the discussion in the proof of Corollary 5.1, which showed that T, and the vector 
function with components (5.14) are functions of each other in Q”. 


If (1) above holds, then the proof of Corollary 5.1 shows that conditions (1) 
and (2) of Theorem 5.5 are verified. And condition (2) 


(3) of that theorem. Hence, Theorem 5.5 esta 
conditions (1) and (2). 


above is identical with condition 
blishes the sufficiency of the present 


Conversely, 


if T is sufficient in Q”, then the present condition (1) holds by virtue 
of Corollary 5.1. 


And then condition (2) above is established by Theorem 5.4. 
This completes the proof of Corollary 5.2. 


This last corollary is Koopman’s type of result. However, it differs in its 
specifics from the actual statements of Koopman. 


earlier, the definition of a sufficient statistic that we 
less restrictive one. 


ferentiability of T at 
nuous statistics, 
ciency of T. 


As we have already remarked 
are working with is the modern, 
In these circumstances, allowing ourselves the continuous dif- 
some regular point of Q’, instead of adhering to the class of conti- 
we are able to obtain necessary and sufficient conditions for the suffi- 
Koopman was obliged to state two separate theorems, 


one giving 
necessary conditions and the other giving sufficient conditions. 
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STATISTICAL ESTIMATION OF DENSITY FUNCTIONS: 


By M. S. BARTLETT 
University of Manchester 


SUMMARY. The optimum choice of weighting function for smoothing sample density func- 
tions is discussed in the cases : (i) probability densities ; (ii) spectral densities. It is shown that the bias 
contribution to the mean-square error can in principle be eliminated to any required order, though the 
resulting theoretical gain in efficiency may not be realised except for very large samples. The relation 
with recent work by Rosenblatt, Daniels and Parzen is noted. 

The further problem of estimating the spectra of stationary point processes is also considered. 


1. Various important examples of the estimation of density functions arise 
in statistics, for example, (i) when a sample of independent observations is available 
from a distribution with a continuous density f(x), where ` 


Î fode = 1; 


(ii) when a time-series realisation of extent T is available from a (real) stationary 
stochastic process X(t) with continuous spectral density g(o) = o%f,.(w), where 


T fo) do =1. 


For simplicity, we shall in place of (ii) consider the analogous problem for a 
discrete-time realisation of extent T = n with unit intervals, so that f,(w) is defined 


up to œ = 7, and analogously for (i), assume that f(x) is zero outside a finite interval 


of known extent. 
The second problem has received most attention in the literature, but the 


comparability of the first problem was emphasized in an interesting paper by Rosenblatt 
(1956). It is convenient to discuss problem (i) first, as in some ways simpler than 
(ii), and with this object in view we recapitulate below some of the relevant formulae. 
Finally, after returning to (ii), we extend the technique to cover the spectra of auto- 


covariance densities. 
Let our estimate of a probability 
gla; welw)] = J walu) dF(u). 


is the sample cumulative distribution function obtained from independent 
is 


density f(x) at x be denoted by the functional 


where Fy(u) 
observations. 


Presented in summa: i of Mathematicians, Stockholm, 
i ized form at the International Congress of ‘ E 
ke sent 1 S rize! 


1962, 
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Then we readily obtain 


Eg] = J w(u) f(u)du, 
var [g] = SJ w(u) w,(v) cov [4F (u), dF(v)] 


AL f ulu) f (udu f vo) fw) du”. 
The mean square error S is given by 
var (LEF) = =f wu) fade 
Lf wef d PALI eke fedu— flay.) 


In order to study the behaviour of (i) further, Rosenblatt considers the case 
where f(u) is well-behaved near x, and can be written 
flu) = f@)+(u—a) fe) +> (u—a) f" e)ur]. 
Suppose also that for æ inside the range of u, we have 
vu) = w(a—u) 


where w(u) is an even function ofu. Then if w(u) is zero (or effectively zero) for |u| > h, 
where / is small, and we suppose further that 


J wudu = 1, 
we have the first term on the right-hand side of (1) of order 1 (hm). Then if 
Jurw(ujdu = v. 


Jw(u) du = W, 
the dominant terms in (1) (for v + 0) are 


S ~ Wit Leper. n g 


For example, if w(u) is 1/(2h) between —h and h, we have W = 1/(2h), v =A në, and 


(2) becomes 


~ FO) të pp 
St H U” (e)P. mo (3) 


Rosenblatt notes that if we put h = cn, œ should for large n be taken to be E , 


and cs = Se) 
af, 
when (3) becomes S ~S ( 7) 415 PAS FISE, ve 14) 
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2. However, if under the conditions for which (2) is valid we consider w(u) 
to be an arbitrary (and not necessarily, as Rosenblatt assumed, a non-negative) func- 
tion which is to be optimised, it is readily found (for example, by the method used for 
an analogous problem in Bartlett and Medhi (1955)) that it should be of the form 


w(u) = C(l—w/a?). 


Now by choice of a, we can here make v zero; we find “—hyë, whence 


C = W = 9/(8h), and 


I 
sq, (5) 


a formula in which, by taking 4 small but fixed, we can apparently restore the result 
S = O(n). Of course, this is not precisely true, for we must now include the next 


order term in the bias 


aq ft? (e) J wt elu. 


The ‘fourth moment’ of w(u), V, say, is in the above case 9h4/110, whence more accu- 


rately 
~ of l pav Shë e 
lirë TH a (6) 


After inclusion of this term, however, wegshould go back and modify our optimum 


function w(u) to arrange for V to be zero as well as v, and so on indefinitely. In this 


way, the result S = O(n) can be approached.* 
I seems doubtful whether such a repeated correction would be practically 


superior over the first correction given by 
(SË), 4 
ww) = gl- së) (7) 
with the rectangular weighting function of Section 1. Let us 


i ill compare cioni 
prom; Cay F the estimation of the mode of the nòrmal distribution with 


consider, for example, 
density function 


1 e 
fle) = Tyee 
for which we note that 
1 FIN s a 1 iv es oe 
f0 S fame ? J (0) = Mn” af ) (0) Van 


here appears related to an independent and more mathematical discus- 
haps note that it was first mentioned ina u course on time series 

: i A llege, London, in the academic year 1960-61. The sitepbive esë of orthogonal poly- 
pees Pk me is x hould also be compared with the procedure for estimating spectral densities 
e te i however, relates orthogonal polynomials with the density function rather 
Preni by Dan i Bi ther a different approach using prior distributions, see Whittle 
than with the weighting 


function. For ra 
(1958). Cf, also the paper by Pries 


*The technique suggested 
sion by Parzen (1961),.50 I might pe 


tloy (1962). 
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Thus we choose h = (4.5jn)' Po, and 


VSIKO ~ 2 | vih n f(01. w (8) 


For this fraction to be 0.1, say, we require n = 303. From (6) we then have for com- 
parison 


VSIFO) ~È | vl n F(0). = 8) 


Within limits, k may be taken as large as we please, but for definiteness we choose the 
same numerical value as in (8) for both h and n, whence the value in (9) becomes 0.134, 
thus stressing the obvious point that asymptotic behaviour has no relevance to samples 
of fixed size. However, for fixed h the value in (9) will eventually become smaller than 
the value in (8) provided h in (8) is taken to be (44/n)8c. For example, for n 
32 times larger, the value in (8) becomes 0.025, whereas that in (9) becomes 0.0237. Of 
course, in this last case we must check that the neglected contribution in (9) due to 


fë (0) is in fact still negligible, but this may be easily checked from the more accurate 
expression (6). 


3. It is evident that a similar approach will be possible for the estimation of 
a spectral density f,(co), the only differences being (a) a slight difference in the func- 
tional dependence on f,(w) and (b) the fact that the moment formulae are only 


asymptotically valid. 
We consider for simplicity the related density function 
P(A) = 2703 f(A), 
and write as our estimate of p(A) at A the quantity 
II 
WA; w, (@)) = J I,{o)w,(o)do, (10) 
where L,(co) is the usual periodogram intensity at the point w obtained from a sequence 


of values X,, Xg,..., Xy. It is possible to express (10) to a sufficient approximation 
equivalently as 


2 
= Zy PAC) w (op), 


where X, denotes summation over I,{w)w, (0) evaluated at the discrete steps 


Op, = 2mp/n, (p = 0,1, ... n, for n even with appropriate connections at 0 and 47). 
Then under suitable re 


E iad gularity conditions on File) and ule) it is well known that 
It 
Eig) ~ I plow (o)do, I) 
2m U 
var [9(A)] ~ — J Po) w(o)do, we (12) 
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whence the mean-square error S is given by 
21 T plow? (edo T J plope(a)do—pay] (13) 
n o La 
For the uniform weighting function w,(@) = 1/(2h), (co in A—h to A+A), we 
have similarly to (3) and (4) (cf. Grenander and Rosenblatt, 1957, p. 154) 


TPA) , Wen a 5 | PLA), CË wy vel 
Sn Tee yg POP nas E I) 


for h = cn—5: 


E manoa pro" 


a5) 


91/5 als 


for c = [97TA p" AE. 
Again, however, by 
order. Keeping for simplicity to the same function as in (7), 
ge _ ö(o—À)}? | 
wo) = &h 1 a 
For a parabolic density law, at least in the neighbourhood of A, this weighting function 
will, as for the probability density estimation problem, give a mean-square error de- 
creasing ultimately as 1/n, in contrast with the n5 of formula (15). For example, 


for the parabolic density function 


choice of w,(«) we can make the bias vanish to any desired 
we write 


(16) 


p"(A) = 32/8, and from (15) 


VS/pla) ~ 1.013 n15. a (17) 
For this fraction to be 0.1, we require n = 537. For (16), we have then (for the same 
value of h) a fraction 0.134, a value greater than 0.1. Nevertheless, for n = 32X 537, 
ti 1 in (17) is 0.025, while the weighting function (16) for fixed h gives now a 
ae sly to the probability density case. 


value for +/S/p(A) of 0.0237, analogou 

4, If Io) is available at integral value of a ee 2rpju, then 
the weighting function (16) becomes only approximately correct, an i the pay values 
appropri the number of discrete points in the interval h to h may be substituted. 
eae found from the analogous conditions piat the sum of the weights 
is gh and the orthogonality with (o—A)* preserved. For example, for 8 points 
in the interval the weights are :— 


points in the interval : 
g PD a a 159 179 18, 
Bu 1344 1344 1344 T344 1344 1344 1344 


and for 16 
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Tt is recalled that in the case of estimating spectral densities various formulae 
have been proposed in terms of the autocovariance or autocorrelation function, the 
main purpose (at least by myself) being to save computation. Now, however, that 
large-scale computers are often available, there is much to be said for working directly 
with the periodogram. The modified estimate discussed above will then sometimes 
be of interest, especially if it is important to reduce the effect of bias; but the value of 
the uniform weighting function, as first proposed in this context by Daniell (1946), 
should also be emphasized. It is especially efficient for testing departure from a uni- 
form spectrum, giving rise in this case to sampling quantities distributed asymptoti- 
cally as y”s. 

5. Further examples of density functions arise in the theory of point processes, 
for which ‘product density’ or ‘factorial-moment density’ functions are defined (see 
Bartlett, 1955, Section 3.42) and may require estimation. For example, in the case ofa 


stationary point process dN(t), the second-order density function defines a ‘covariance 
density’ function 


uT) = HAN (t+-7)aN()]_ y2, (r > 0), as KS) 
(de) 
where — FAN) 
A= DE A pes (19) 


sve and to the covariance function W(7) of a stationary process X(t). The sampling 
aes les si an pumate of (7) can, like those of an estimate of W(r), be investigated, 
are rather complicated; and it seems more convenient to discuss the estimation 


fth ë aa. 3 
; e Fourier transform of u(r), which is the analogue of the spectrum corresponding 
o W(r) (see Bartlett, 1955a, Section 6.12). 


pr zi will be shown that periodogram intensities Z y(o) may be defined with similar 
A oe eae to those of I,(w), so that the smoothing technique developed 
n 3 above will be readily applicable to this further problem. 
We define 


glo) = J e=". ntrjar. (20) 


Note that (7) is only defi ë 
TË ee 0 Kët efined in (18) above for 7 > 0. -It is a symmetric function, 


ELAN (t))8) = E[dN(t)| = A dt 


(21) 
th i " 
so that the integral (20) has a contribution A at T =0 and g(w) is of the form 
Ho) = A+-gul). (a) 
For w defined for non-negative values only, we write 
galo)  2A4-2u(o). (83) 


250 


STATISTICAL ESTIMATION OF DENSITY FUNCTIONS 


For events at random times T4, To, ..., Tn in the interval (0, T) we define 


Jo) = [3 et = fz fete awo). a 4) 


s=1 
We have, analogously to the properties of an ordinary periodogram intensity I,(w) 
Elo) = EJ) - 


T Tr. 
[ J àdt+ I feo n(u—v)du dv]... (25) 
0 ò 


~ glo) for large T. 


To investigate the sampling properties of Z,(w) it is usual (e.g. Bartlett, 1955a, 
Section 9.2) to assume that X(t) is a ‘linear process.’ The analogous interpretation of the 
point process V(t) will be that it can be regarded as a Poisson process with rate A 
which is itself a stationary linear process A(t) with mean A. It then follows (Bartlett, 
1955b) that the relation between the characteristic functional of N(t) and A(t) is 


Btexp i f OO aN] = Ea [exp f Aeon. a (26) 


This relation includes for example the result contained in the equation leading to (25) 


Ello] = 2A+EHo)], ne (27) 


where Llo) = JoJo), 


no- Y 3 [Aw pite dt, 


and a(r) is now also the autocovariance function of A(t). 


In (26) choose i ' 
i0(t) = V(2/t)(A1 ete 10 e~™), 
ye xpanding ei) on the right-hand side of (26) we find 


Then b 
E, fexp[O, Talo) +Oa7%()-+2A9 Os] + (1 VT) 


Blexp[0,T(c)+007*(o)I] = 
garithms to obtain cumulant functions, 

K(i 09) ~K, (0. 03) +20:02, vss (28) 
J*(o), and that on the right 


or, by taking lo 


where the cumulant function on the left refers to J(c0), 


to Jilo), I*a (0). 
The use of complex quantities may obscure the interpretations of the extra 


term 2A0,0, in (28), but this result implies that J(o) is asymptotically equivalent to 
Jo) gjetë from the addition of an independent (complex) component with real and 
alo) apa 
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imaginary parts each uncorrelated normal with zero mean and variance A. We 
deduce the following asymptotic results in addition to (25). 


var [I(o)] = ER o) Eo) 
= E*(e)J*(o)]-[Z(@)]P 
~ Kat KoKo tK? 
~ ki ~ Ello) ne (29) 


where these cumulants relate to K(,, 0) and it is noted that EX J(w)] ~ 0 for o 4 0. 
The last line then follows because it is known to be true for J, a (6) and Z, (©); from (28) 


Koo, Ko) and Ko are also zero for the extra component, whose cumulants are additive 
to those to J, (0). 


Notice that the extra component adds to the variance of Io) as well as to its 


mean, so that the fluctuations of I(«) are similar to those in standard periodogram 
analysis. We may consider 


Io) = Ko)—2N (DT ws, (80) 


where the second term corrects for the extra component by means of a sample estimate, 
but the variance is not thereby reduced. 


That we may regard I (o) as having similar asymptotic properties to Z, (w) 
has only been shown above at one point œ, and in particular for the variance; but it may 


similarly be shown for two points, and the covariance of I(w) and I(o')(@, w 5 0), 
by writing in (26) 


i0(t) = abe (01e Gjetite t 6 cite" G emite’), 


It should perhaps be added that the asymptotic behaviour of Ilw) for large 
T for a particular œ, or pair co.” could be deduced similarly for any finite set of 
separate a's, but could not be expected to hold for a set of increasing number without 
further conditions. This limitation applies in the standard periodogram case, but as 
o is unrestricted in range raises some further queries. For example, for a particular 
set of observations, fy, Ej, ..., t 'y Some effective upper limit on the range of w is required, 


but no precise discussion of this point is attempted (cf. also the statistical analysis in 
Section 6). 


6. To illustrate the technique of Section 5, two examples were analysed. (I) 
The first set of data consists of times at which 129 successive vehicles passed a point 


on a road;* (TI) the second is an artificial realization of a purely random or Poisson 
process, with a mean interval of unity. 


*I am indebted for these figures to Dr, A. J, Miller, who had supplied them as a class examplo for 
demonstrating other types of statistical analysis I have s 


i uggested for such data. The programming of 
the computer calculations is due to Mr. D. Walley. 
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In Example I the total time elapsing is 20235 (in units of 1/10 second), giving 
an average interval of 158.1. This was reduced to the order of unity by taking 160 
as unit time interval. Thus if the recorded times are f, tj,...: tyg9, the scaled time- 


intervals 
T; = (t;—t,)/160, (i = 1, ..., 128), 


were used, and J(w,) defined as 


n si 
Jo) = Aleg)-tiBte) =( V. = ) È és op. 
128 / *=1 
The values of œp chosen were of the form 2rp)N, where N and p are integers. In 


standard periodogram analysis, N would be of order n, where n is the sample length, 
and was thus chosen to be 128. Moreover, the range of p, when N ~ n, is $n(w, < 7). 
Correspondingly, in the present type of problem, we should take the range of p to be 
at least 4N, or 64. It was taken to be 4 times this amount, as a reasonable compromise 


between taking too many values of p and taking a high enough maximum for the bulk 
of the relevant variation of the spectrum to be included. 
In Example II the artificial series also consisted of 128 time-intervals, but 
these did not require any preliminary scaling. The values of N and the range of p 
xample I. A more detailed record and discussion of this analysis 
but a summary of the analysis is provided by Tables 1 and 2. 
(œ) estimates for each example (i) using a uniform weighting 
a parabolic weighting function over 16 points (with 
as given in Section 4). It should of course be remembered that in Example IT 
on is known to be constant (with mean 2), and the uniform weighting 
function would necessarily be superior to the parabolic, as is suggested by the vë 
In Example I it was known by previous analyses that the; process was a Poisson. 
This would be tested on the present approach with e arn “ea Pë 
(for which the smoothed J() are approximately distributed proportionally to x 
with 32 degrees of freedom); but once the non-constancy ki the spectral function is 
. he use of the parabolic weighting function might be pistermed as giving 
septa ditions more accurate, estimate of the true 


è iased, and under certain con r i ; 
vë BË I of 2N(T)IT should be noted in this case viz. 2.024. 


vvere taken as for E 
will be given elsewhere, 
These provide smoothed Z 
function over 16 points (ii) using 


weights 
the spectral functi 


function. 
‘TABLE 1. ESTIMATED SPECTRAL VALUES FOR TRAFFIC DATA 
ti) (ii) (i) (ii) (i) (ii) 


(i) (ii) 


2.136 2.478 13 2.128 3.086 
5 2.974 1.761 9 
1 4.702 5.970 5 


e 2.353 3.139 
2.467 11 1.701 1.879 15 1.163 1.495 


10 2.029 1.246 14 1.118 0.590 


2 3.120 2.790 
3 4.380 2.876 7 2.706 


g ass 8547 12 2.477 1.507 16 1.788 1.507 


4 2.761 2.628 
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TABLE 2. ESTIMATED SPECTRAL VALUES FOR ARTIFICIAL SERIES 


(i) (ii) (i) (ii) (i) (ii) (i) (ii) 
1 1.740 1.497 6 1.647 1.823 9 1.598 1.686 13 2.038 1.642 
2 2.561 3.425 6 2.146 2.573 10 1.7384 2.072 14 2.202 2.909 
3 1.606 1.101 7 1.499 1.597 li 1.980 1.958 15 1.779 0.789 
4 3.323 4.027 8 1.621 1.21 12 0.708 0.662 16 1.699 1.293 
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SOME LIMIT THEOREMS FOR THE DODGE-ROMIG AOQL, 
SINGLE SAMPLING INSPECTION PLANS* 


By A. HALD and E. KOUSGAARD 
University of Copenhagen 


1. INTRODUCTION 


In a previous paper by Hald (1962) limit theorems for the Dodge-Romig 
LTPD single sampling inspection plans have been derived. The purpose of the present 
paper is to find similar results for the AOQL plans. Dodge and Romig (1941) did not 
1 their paper where they derived the equations for the AOQL 
plans and tabulated such plans for lot sizes up to 100.000. Both from theoretical 
and practical points of view it is, however, interesting to develop an explicit asymptotic 
solution to Dodge and Romig's equations and study to what extent this solution 
is valid for finite lots. 

The main results are that the highest allowable fraction defective in the 
sample converges to the AOQL, the difference being of order y/log n/4/n (Theorem 2), 
and that sample size asymptotically is proportional to the logarithm of lot size 
(Theorem 3). It is further shown that the producers risk asymptotically decreases 
inversely proportional to lot size and that the average emoun: of inspection for lots of 
process average quality apart from sampling inspection is independent of lot size. 
Finally, numerical investigations have shown that the asymptotic formulas for accep- 
and sample size are good approximations to the exact solution also for 
pact graphical representation of the asymptotic solution 


consider this problem ir 


tance number 
small lot sizes and a com 
is given. 

From a purely P 
result regarding the Pois: 


robabilistie point of view the most interesting is perhaps the 
son distribution stated in Theorem 1. 


9. RELATION BETWEEN THE EXACT AND ASYMPTOTIO SOLUTIONS 
pë A 


as been kept as close as practical to Dodge and Romig's. The 
average outgoing quality limit, pz, the process average 
p, and the number of items in the lot, N. The problem is to deter- 
of items in the sample, n, and the acceptance number, c, from the 
rements : (1) The maximum value of the average fraction defective 
th total inspection of rejected lots, replacing all defective 
all be equal to pz. (2) The average number of items ins- 
quality shall be a minimum, assuming that the 


The notation h 
given parameters are the 
fraction defective, 
mine the number 
following two requi E 
after sampling inspection W? 
items found by good ones; sh 
pected per lot of process average 
remainder of rejected lots is inspected. 


* 1 in part at the Statistical Techniques Research Group, Princeton University, with 
e i, 5) VË “n NGJIBHQ— 
i serrave Research, and under ONR Contract Number N62558-3073. Reproduction 
ee on ta perimittë d for any purpose of the United States Government, 
n whole or in p 
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For incoming lots of size N from a binomially controlled process with fraction 
defective equal to p the average probability of acceptance is 


and the average fraction defective after inspection becomes approximately 


I 


pa = (1-7) pB(c, n, p). ve (la) 


The approximation involved consists in replacing the average number of 
defectives found in accepted lots by np which is usually a negligible error. 


For p; < 0.10 Dodge and Romig have used the cumulative Poisson distri- 
bution, B(c, np), instead of the binomial, i.e. (la) is replaced by 


Pa = (1—7) pB(c, np). wow (1b) 


The value of p maximizing p4 is found from the equation dp,/dp = 0 which 
leads to 
Ble, x) = zbe, 2), & = mp, » @ 


where b(c, ©) = e77 a/c! denotes the Poisson probability and max 74 = Pz for p = Pr 
Pp 


Introducing the auxiliary variables m = np, and M = Np; we find from 
(1b) for p = p, 


m= (1-37) aB(c, x). se) 


Equation (2) determines a relation between x and c which may be used to 


eliminate x from (3), Thus, from (3) we may find m as a function of c and M,m=™,m 
say. 


The average number of items inspected per lot of process average quality is 
IB) = n+(N—n)(1—B(c, n, D). ua (48) 


Approximating the binomial by the corresponding Poisson distribution and introducing 
2 = I(G)py and + = 5/p, we obtain 


z = m+(M—m)(1—B(c, rm)) = M—(M—m)B(c, rm) we. (4b) 


which is proportional to the function minimized by Dodge and Romig for r < 1 and 
Pr K 9.10. ë 


is sh Të Noted that m, M,z, and r here are defined similarly as in the 
paper by Hald (1962) with the natural modification that p, has been replaced by Pr. 
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The problem of minimizing z with respect to c when m = m, x is formally the 
same as the one treated by Hald (1962). The upper boundary of the zone in the 
(M, r) plane with acceptance number c is therefore given by the equation 


Mesa, BC+, 7Me43,1¢)— Me, B(C, Me, x) 
B(e--1, 7M13,)—Ble, rMo nr) x- = Uc), s (8) 


M = 


which means that for all M in the interval (M(c—1), M(c)) we have z(c) < z(c+i) 


i = +1, +2, ... whereas for M = M(c) we have 2c) — z(e-- 1) < z(c+i), 


i = —l, +2, +3, 

The asymptotic solution is obtained by treating (c, n, N) as continuous and 
approximating B(c, rm) by a differentiable function. Replacing the difference 
equation Az = 0 for the determination of M(c) by the differential equation dejde = 0 
we find M = Me) and the corresponding approximate relation M(c) = (e +4). 


Instead of using (2) and (3) for eliminating m from (4b) we might just as well 


have eliminated c and minimized z with respect to m. 


The procedure in the following will be to rewrite (2) as an integral and then 


find a manageable relation between ¢ and a from an asymptotic expansion of the 
By means of this relation x is eliminated from (3) and cis found as a func- 


integral. 
From dz/dm = 0 we finally find 


tion of m which is used to eliminate ¢ from (4b). 
an expansion for M in terms of m. 


3, RELATION BETWEEN SAMPLE SIZE AND ACCEPTANCE NUMBER 
Theorem 1: The equation 
Ble, x) = xb(c, x) (6) 
has the asymptotic solution 


a, 1l ew E 
c=a+,/xlog A + ë log pe --o(1). (7) 


Proof: By means of the relation between the cumulative Poisson distribution 


and the incomplete Gamma function we may rewrite (6) as 


rë 1 
I ze dz =F e af tH, 
el 


= 


1 
ol 


Putting z = 2(1+t) we find 
co 
Fe, æ) = faren dt = 1. _ 
0 
t to determine 6 = ôs for ro. 


Introducing o= #148) we wan 
but at a slower rate than 1//z. Let us suppose 


First we show that 530 0 t 
that asymptotically ¢~*~ av, ie. ò aje, where æ is positive and finite. From 
the central limit theorem it then follows that the left hand side of (6) tends to O(a) 


whereas the right hand side is of the order g(x) and thus tends linearly to infinity. 
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Consequently the assumption $ “ay leads to a contradiction. Suppose next 
that c—x— ax, i.e. ËS ax where « is positive and finite. This means that the standar- 
dised normal deviate becomes (c—z)I Ve Sava. It then follows from a formula by 
Blackwell and Hodges (1959) that the left hand side of (6) tends exponentially to 1 
whereas the right hand side tends exponentially to zero. Thus also this assumption 
leads to a contradiction. Since the two assumptions, 6 “gjys and 6 ~ a, lead to 
discrepancies between the left and the right hand sides of (6) going in opposite direc- 
tions we have that 0-90 at a slower rate than 1/4/z. 


Rewriting (8) we find 


Fe, 2) = f em dt z (9 
with g(t) = (1+8) log (1+t)—t, t> 0, (10) 
0 - 
and g(t) = (—1)4r—1)(1-+6)(1-+44)" for r> 2. 


It follows that g(t) is increasing for 0 < £ < ô and decreasing for t > 6, that g(0) = 0, 
max g(t) = g(8) = (1+6) log (1--6)—6 > 0, 
gt) =0 fort 20, and g(t) —co for to. 


Since 6 0 the essential contribution to F(c, x) for x co comes from the 
neighbourhood of t = 6 where we have 


xg(t) = xg(d)+e Y, (—1)-2 (t—ô)” 


Introducing u = (t—ô)y a/y IFS w ATL) 
we find xg(t) = ag(d)— KY = ((1++6)a)-@/241, vex (12) 
ves 


From the properties of g(t) described above and from the fact that d— 0 it 
follows that it is always possible for any fixed t = tf, > 0 to find a = Xo say, so that 
glto) < 0 for all a> Zo Which implies that 

to 
F(c, x) = f esodi1--O(e)) for a> Zo: 
0 
Changing the variable in the i 


hite ket ntegral from £ to u defined by (11) and using 
EN ae, (to—8) Nal VIFS 
Fe, x) gj (1+8) eats) j gudu (140 ( t )) ve (18) 
bana vë 
—ë ya] NIFS 


where ¢(w) denotes the standardised normal frequency function 
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To get a first approximation to ô we observe that for 60 and x= the 
main terms of log F becomes 


log F(c, x) = —} log a Hed ead i (14) 
. 62 KLOS 04 
since g(6) = To 


From (8) we have log F = 0 which leads to 
1 x 
on = log ase (15) 
By means of this result and Mill’s ratio we can evaluate the integral in (13) which 
1 
becomes asymptotically 1—5. 
Expanding the three terms of log F we finally find 
1 Wels orl caper laces LAN 
log Fle, 2) = — log pt dë n0— që +0(=) EHO) pease) 


from which $ may be determined by successive approximations leading to the result 


s= 4 log ote log se —este(a): 


From this we immediately get Theorem 1. 
From M— co we may disregard m |M in (3). In the following we shall 
therefore first solve the equation 
m = xB(c, x). ss I) 


From this solution we may then find the solution, m*, say, to (3) as 


m Mm 
*= — u Mim CrS 
H 1+m/M Mim ) 
noted that m in (17) is Dodge and Romig’s y). 
x) we use the normal approximation to the Poisson distri- 


o which give 


(It should be 


To evaluate B(c, : 
bution together with Mill’s rati 


Bee a~ 90 AVBE I= vee (19) 
' x log. 


Aoodrding to (17) this should be equal to mje. A comparison with the values tabulated 
iy Dodie and Romig shows that a considerably better approximation may be obtained 
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by adding a further term equal to 1.9/z. (It should be noted that this term has not 
been derived from an asymptotic expansion). From (17) we then get 


m—ax — He Oe we (20) 
lo a 
6 27 


By inversion it is found to the same degree of approximation that 


a~mt E —1.9. m (21) 
; log 5 


Combining (7) and (21) we find the following theorem. 


Theorem 2: Asymptotically the acceptance number c may be expressed as the 
following function of m = npr, 


m m 1 m 
a i 9, (22) 
c=m+ [meet a Bi 9 ( 
log 2 
For the LTPD system we found 
LË ij Pe as (28 
oe 1.28, /2 ; (23) 


i.e. c/n converges to p, from below, the difference as usual converging to zero as 
VJ Va because a point (p, 0.10) of the OC-curve for the sampling plan has been fixed. 


The result for the AOQL system just proved shows that 


NË Pr flog np, ve (24) 
n Prt” 2” ( 


i.e. c/n converges to p; from above but the difference converges to zero at a much 
slower rate than 1/4/n because fixing the maximum of the AOQ curve is a requirement 
essentially different from fixing a point of the OC-curve. 


Considered as a limit theorem the first few terms of (22) are sufficient. How- 
ever, since we also want to develop formulas which may be used for finite lots (22) 
has the drawback that it is not defined for m < 2r. Preserving the asymptotic pro- 
perties we have therefore modified (22) to the following which is defined for m > 0.5 : 
c = m+4/m—0.5 (= +0.5v—1.9 we (25) 
a/v 


where v = log (+mnj/2r). 


Table 1 shows the relation between c and m. Comparing with the corres- 
ponding Dodge-Romig table it will be seen that the approximate m according to 
(25) deviates at most 0.07 from Dodge and Romig’s for 1 K ¢ < 40 which is the tabu- 
lated range. The conclusion is that (25) gives a rather accurate solution to equations 
(2) and (17) apart from the case c = 0 which has the solution m = 0.3679, 
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TABLE 1. RELATION BETWEEN m AND c ACCORDING TO (25) 


g TE 9 m e m 

17.0 11.68 34.0 24.9 

0.5 0.73 17.5 12.06 34.5 25.34 
1.0 0.89 18.0 12.44 35.0 25.74 
1.5 1.09 18.5 12.82 35.5 26.14 
2.0 1.33 19.0 13.19 36.0 26.54 
2.5 1.59 19.5 13.57 36.5 26.95 
3.0 1.88 20.0 13.96 37.0 27.35 
3.5 2.17 20.5 14.34 37.5 27.75 
4.0 2.48 21.0 14.72 38.0 28.16 
4.5 2.80 21.5 15.10 38.5 28.56 
5.0 3.12 22.0 15.49 39.0 28.97 
5.5 3.45 22.5 15.88 39.5 29.37 
6.0 3.78 23.0 16.26 40.0 29.78 
6.5 4.12 23.5 16.65 40.5 30.19 
7.0 4.46 24.0 17.04 41.0 30.59 
7.5 4.80 24.5 17.43 41.5 31.00 
8.0 5.15 25.0 17.81 42.0 31.41 
8.5 5.49 25.5 18.20 42.5 31.82 
9.0 5.85 26.0 18.60 43.0 32.23 
9.5 6.20 26.5 18.99 43.5 32.64 
10.0 6.55 27.0 19.38 44.0 33.04 
10.5 6.91 27.5 19.77 44.5 33.45 
11.0 7.27 28.0 20.17 45.0 33.86 
11.6 7.63 28.5 20.56 45.5 34.28 
12.0 7.99 29.0 20.95 46.0 34.69 
12.5 8.35 29.5 21.35 46.5 35.10 
13.0 8.72 30.0 21.75 47.0 35.51 
13.5 9.08 30.5 22.14 47.5 35.92 
9.45 31.0 22.54 48.0 36.33 

rit 9.82 31.5 22.94 48.5 36.75 
15.0 10.19 32.0 23.34 49.0 37.16 
15.5 10.56 32.5 23.74 49.5 37.57 
94 33.0 24.13 50.0 37.99 

Mm 
JËS L= 
m = Mim PL 


4, RELATION BETWEEN LOT SIZE AND SAMPLE SIZE 


From Theorem 2 it follows that ¢ tends to infinity with m = np; We there- 


fore need an asymptotic expression for the Poisson distribution with parameter m = np, 
Ð < pr, for m and o ~ "Pr both tending to infinity. The required result is given 


by the following lemma : 
For c> m we hav 
a 1 —m-+e—c log (c/m) pa 
a te ees (0 (14+-O((c—m)~)). na (2 
1—Ble, m) = ca Vere (ee) 


of a theorem given by Blackwell and Hodges (1959). 


Lemma : e asymptotically 


This lemma is a special case 
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From (4b) we have 
z= m-+(M—m)(1—B(e, rm)) = m+(M—m)f(m), sæ (2%) 


say, since c is a function of m given by (25). Minimizing z with respect to m leads to 
the equation 


dejëm = 14(M—m)f'(m)—f(m) = 0 
from which we find 
log (M—m) = —log f(m)—log(—d log f(m)/dm)+-log(1—f(m)). ws. (28) 


This relation gives us M as a function of m. In the following we shall derive asympto- 
tic expansions for each of the three terms on the right hand side of (28) discarding all 
terms which are o(1). This leads to the following theorem. 


Theorem 3: The asymptotic relation between sample size and lot size is given by 
log M = m(r— log t—1)—9/m—0.5(4/0-41]4/v) log r 
+4 v(1—log r)+ = log m—log (r—log r—1-+-1/m) 
+0.85 log r—0.26 log (1—r)+ log 4/2n—0.1-+1/2m, ni (29) 
where v = log (1+-m/2n). 


Proof: For convenience we write in the following 


— [og (man) , (30 
aj qen m— 0.5 nee 


so that (25) may be written as 


—(m— $ 1.4 1 
SO (itet ose 2° most? i) 
from which follows 


i ELE pë 2.4 1 
E mos Ecoe moet a 
and log L — ] £ BO L 
E im rri m05? Ey 
be eis 1 2.9 


Introducing 


gm) = m—c+e log £ 
m 


=m[r—< (og r) £ log - ral 
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we find 
g(m) = m(r— log r—1)— [ (m—0.5)e+ `] logr 
+5 (m—0.5)e%(1 log r)+1-+1.9 log r+0(1). se 165) 
a m a KA 
Since zi B S (1--0(c)) 
m 
and 2mo = y/2nm(1+0(e)) 
we finally get from (26) that ; 
r 1 DE 
fm) = i ne am (1+-O(e)) ne (82) 
from which follows that 
—log f(m) = glm)+ Slog m--log /27—log ar +o(1), ni (33) 
—d log f(m)/dm = (r—log r—1)+o(1) i (34) 
and log (1—f(m)) =—f(m)+o(f(m)) = 0(1). 


Inserting these results into (28) we find 


1 
log (M—m) = m(r—log r—1)—[(m—0.5)e+— | log r 
+5 (1—log r)(m—0.5)e°+ + log m 


— log (r—logr—1)+1+0.9 log r+log(l—r)+log vër... (35) 


which is identical to (29) apart from the last (constant) term. Numerical investiga- 
tions have proved that a better approximation is obtained for small m by changing the 
last term as indicated in (29) and adding a further term equal to 1/2m. 


As explained in Section 2 we may use formulas (18), (25), and (29) to obtain 
approximations to the Dodge-Romig plans by putting c equal to 0.5, Së sy Solving 
(25) for m by means of Table 1, and finding intervals for M (corresponding to every 
integer value of c) from (29). 

Suppose we want to determine the interval for M corresponding to c = 6 
and r= 0.9. From Table 1 we read m, = 3.45 for c= 5.5 and my — 4.12 for 
c= 6.5. Inserting these values into (29) we find M, = 53 and M, = 72. Thus, 
for 53 £ M <72 vo use CX 6 and np, = më = 3.78MJ(M --3.78). Numerical 


results will be discussed in Section 7. 
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A formal inversion of (29) has been carried out but the result is not i 

. . ce x 

for practical purposes since the convergence of the series is very slow. A “numerica 
inversion” is given in Section 7. 


Comparing the asymptotic formulas for the two systems of Dodge-Romig 
plans it will be seen that an essential difference comes from the second term of the 


expansions which is of order Ve for the LTPD plans but of order Van log m for the 
AOQL plans. 


5. THE PROBABILITY OF ACOEPTANCE 


Since c— np, we shall consider the operating characteristic in the neighbour- 
hood of pr. By means of the normal approximation to the Poisson distribution we 
find the probability of acceptance as 


Plo) ~ Blo, np) ~ @{ E) (<P). 


Using (24) we find the value of P giving a specified acceptance probability P as 


p~n(14 Vae) me” 


Np, 


where up denotes the P fractile of the normal distribution. It follows that the OC- 


curve tends to become vertical through p = py but that the convergence to the 
limiting form is very slow. 


From (28) and (34) we obtain the producers risk 


1—P(p) = flm) ~ Were’ (37) 


Le. asymptotically the producers risk decreases inversely proportional to lot size. 


This result is the same as for the LTPD plans. It should be noted, however, 


that (37) gives a rather poor approximation to the producers risk unless M is very 
large and r is small. 


As stated by Hald (1962) 
tion between Pi and py, for the Li 
on Theorem 2, Similarly, 
as the quality having an 
P = 0.10 and m, = MPi 


without proof there exists a simple asymptotic rela- 
TPD plans showing that PrP, The proof depends 
it follows for the AOQL plans that p,—pr, p, being defined 
acceptance probability of 0.10. From (36) we get for 


My mtam log- p- + 1.28./m. 
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Since this formula cannot be used for m < 27 it has been taken as model for 
an improved formula in syhich m/2m has been replaced by 1+m/27. Retaining the 
coefficients of the three terms above, changing to logarithms with base 10 and adding 
a fourth term we find the following rather accurate formula 


m= m2 m cee (1432) +128 Vin+6229/)0g,, (14-2) 


where m = npr. 


6. MINIMUM AMOUNT OF INSPECTION PER LOT 


From (27), (28), and (34) we find 
z = I(p)p, ~ m+1/(r—log r—1) 
“e 1 


or I(P) n~ (38) 


Pr log T—(p1—?) 
i.e. the average amount of inspection per lot of process average quality apart from sampl- 
ing inspection is asymptotically independent of lot size. 

This result is also similar to the one found for the LTPD plans. It means that 
the relative amount of inspection of “rejected” lots of process average quality as 


compared to sampling inspection tends to zero as lJlog M. 


7. NUMERICAL INVESTIGATIONS 


The relation (29) between M and m has been drawn on semi-logarithmic paper 


for r = 0.1, 0.2 0.9, and an inversion of (29) has then been obtained by fitting 


curves of the form 
m= Yi log M-+yay/log M+ys log log M+y7, 
The coefficients found have been expressed as func- 


(39) 


using suitably selected points. 
tions of r in the following way : 
2,3220r + 0.0280/(1—7)—0.0897 


log Y= 7 
Zy) — 2.5660r+0.0306/(1—7)+0.3645 

w 3.05177--0.0325 (1—r)--0.91 58 (1--r)—0.7090 

vr 0.8082 (1--r)—0.0412. 


log y4 = 2.7291r-+0.0307/(1—1) + / 
Table 2 contains a tabulation of these coefficients together with the coefficients 
a ains a : 
in formula (29) after transforming all logarithms to base 10. 

é roximation to (29) depends ony. For N < 100.000 

i racy of (39) as an approxima Fona 
vej vo he error in m will be less than 0.20, for 7 = 0.80 less than 0.30, 
Kë ai 90 SË less than 0.40 although oceasionally as high as 0.70. For 
i më nxë pos the error will be considerably less than the figures stated, 


most values of i 
otic formulas should not be used for small values of M, ie. M 


: : . : e MM F 
ehea For such values the exact solution has been given in Table 3. 


less than about 15. 
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TABLE 2. COEFFICIENTS 8: TO 8s FOR COMPUTING log M=logNp, FROM m AND r=p/P, 


ACCORDING TO (29) AND COEFFICIENTS y: TO y4 FOR COMPUTING m FROM M AND r 
ACCORDING TO (39). (log—logio AND In=loge) 


concent of \ 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 


Bi m 0.6091. 0.3515 0.2189 0.1374 0.0839 0.0481 0.0246 0.0101 0.0023 


Ba y(m—1/2)log(1+m]27) 1.5174 1.0606 0.7934 0.6038 0.4568 0.3366 0.2351 0.1471 0.0694 


By Jon- iaog Fmjër)0.6590 0.4606 0.3446 0.2622 0.1984 0.1462 0.1021 0.0639 0.0302 
Bs log (1+m/27) 1.6513 1.3047 1.1020 0.9581 0.8466 0.7554 0.6783 0.6116 0.5527 
Bs logm > 0.5000 0.5000 0.5000 0.5000 0.5000 0.5000 0.5000 0.5000 0.5000 
Bo log (r—lnr—l--ljm) -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 —1.0000 1.0000 


Br 1 -0.4824 -0.2133 -0.0485 0.0751 0.1781 0.2706 0.3599 0.4550 0.5768 
Bs l/m 0.2171 0.2171 0.2171 0.2171 0.2171 0.2171 0.2171 0.2171 0.2171 
Yı log M 1.491 2.569 4.435 7.687 13.406 23.632 42.563 80.891 190.590 
Y2 vlog M -4.520 -8.240 -15.067 -27.663 -51.133 -95.631 -183.105 -371.792 -954.773 
Ya log log M 2.916 5.071 9.066 16.595 31.073 59.655 118.639 253.302 699.085 
Ys 1 3.494 6.271 11.380 20.874 38.745 73.074 141.590 291.834 763.170 


——— 


TABLE 3. TABLE OF c AS FUNCTION OF M = Np, AND r=plPy, FOR M<15 


— — tt t$9 OO ————— 
x 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 
0 1-13 1-7 1-5 1-3 1-3 1-2 1-2 1-2 1-2 
1 14-15 8-16 6-15 4-15 4-13 38-10 3-8 3—7 3—6 
: u- Tis 9-15 8—15 7—13 


For values of (M, r) not given in the table the asymptotic formulas may be used. 


For c — 0 use the following m = m(M) 


= 3 4-5 6-8 9-13 


0.34 0.35 0.36 


ti Forc > 0 use mt from Table 1. 
AET ae tein të E yon z numerical comparison of the exact and the 
aaor SST Dele qa E it i ae pomala to read the exact solution with sufficient 
rather coarse lot pës Se hi graphs and their tables give sampling plans for & 
of the c aot a ve aed = contains, however, some characteristic results 
piparisi which has been carried out for tabulated Dodge-Romig plans cor- 
responding to upper end-points of lot size intervals. The ble shows ri dh M 
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the tabulated Dodge-Romig plan, the “asymptotic plan”, the average amount of 


inspection, and the AOQL. It will be seen that there is no essential difference between 


the two sets of plans. 
TABLE 4, COMPARISON OF “ASYMPTOTIC PLANS” ACCORDING TO (39) AND TABULATED 
DODGE-ROMIG PLANS 


Dodge and Romig asympt. i - 
100p; ” M c m e mi ip, IA(p)pz, 10092 100p£ 
0.50 0.10 50 1 0.83 1 0.87 0.99 1.05 0.50 0.47 

500 2 1.38 2 1.33 1.68 1.861 0.50 0.51 

0.50 20 2 1.28 2 1.25 1.79 1.73 0.50 0.51 
50 3 1.88 3 1.81 2.63 2.47 0.50 0.52 

500 6 3.78 $ 4.42 5.44 5.43 0.50 0.50 

0.90 20 $ 78 3 1.72 3.22 3.03 0.50 0.52 
50 5 2.98 5 2.94 5.58 5.41 0.50 0.51 

16 10.7 15 10.0 20.4 20.8 0.50 0.50 


500 
1.10 1.17 2.0 1.9 


0.84 


Se E 
2.0 0.10 80 1 1 0.88 

200 2 140 2 1.82 1.48 1.39 1.9 2.1 

2000 3 1.90 ai 1:88) iziga 2-97, 2.0 2.1 

0.50 10 1 10:78 g “osë. LI 2.0 1.9 

20 2 1.80 @  -1.2£ 1.88 17 3:0) nee 

80 3 1.90 4 2.40 3.16 3.00 20 21 

200 5 310 5 3.08 4.12 4.07 2:0. = 2.0 

2000 9 5.80 10 6.52 7.51 7.68 2.0 2.0 

0.90 20 3 1.80 3 1.72 3.29 3.03 310) 7 a 

80 7 4.20 6 3.60 7.17 7.20 2.0 2.0 

200 11 7.00 10 6.34 12.4 12.5 2.0 2.0 

1 0.8 1 0.9 0.95 1.09 10.3 9.2 

0 Dë Br 2 1.4 2 1.3 156 "1.43 9.8 10.5 

10000 3 1.9 4 2.5 2.37 2.67 10.2 10.2 

0.50 50 3 1.9 3 1.8 2.67 2.45 9.8 10.4 

i 100 4 2.5 4 2.4 3.39 3.16 9.9 10.8 

400 6 3.8 6 3.7 5.17 4.88 9.9 10.2 

1000 8 5.0 8 5.1 6.13 6.41 10.2 10.0 

10000 12 8.0 13 8.7 10.7 10.5 9.9 10.0 

1.8 3 1.7 3.28 2.97 9.8 10.5 

ia x : 3.0 5 2.9 6.67 5.24 9.9 10.3 

100 s 4.9 7 4.3 8.35 8.49 10.0 10.0 

400 15 10.0 14 9.2 18.6 18.0 9.9 10.0 

1000 23 16.0 al lag 28.5 28.9 10.0 10.0 


elation between M and m according to (39) for 7 = 0.1, 
m and c has been given on the vertical scale. 
|p we may thus read (m, c) on Fig. 1 and 
y for the reader to draw his own diagram 


š Figure 1 shows the T 
0.2, ..., 0.9. The relation (25) between $ 
From given values of M = qi adë 5 
nx from (18). 40 make it eas shën 
then compute 74 pies eae Pe 


the necessary values of ( 
267 


SANKHYA : THE INDIAN JOURNAL OF STATISTICS : SERIES A i 


AOQL SINGLE SAMPLING INSPECTION PLANS 
M=NA, rpa, I 


| 
Ee om | 


sja(alslejsje(zjalejejejal sjejsjs je) sje) s)s)= 
i 


080 


HE 


|-[+]+]-|-|-[-]-]5 
EEI 
$ 


conega 


Fig. 1. Relation between M, m, and c according to formulas (39) and (25). 


TABLE 5. TABLE OF m AS FUNCTION OF M=Np, AND r=PlP; ACCORDING TO 
FORMULA (39) 


Sam 
M r=0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 
15 0.5 0.71 0.89 1.08 1.25 1.3 1.43 1.61 1.14 
20 0. 0.79 1.00 1.22 1.41 1.5 1.67 1.95 1.99 
40 0. 1.00 1.27 1.57 1.86 2. 2.30 2.69 sult 
100 0. 1.28 1.67 2.12 2.60 3. 3.48 4.07 4.54 
200 K 1.52 2.01 2.61 3.27 3. 4.71 5.67 6.43 
400 1. 1.77 2.38 3.15 4.05 5. 6.25 7.78 9.30 
1000 Je 2.13 2.91 5.94 5.22 6. 8.74 11.40 14.77 
2000 le 2.41 3.35 4.60 6.21 8. 10.95 14.73 20.19 
4000 një 2.71 3.81 5.30 7.28 9.9 13.42 18.56 26.69 
10000 2. 3.12 4.44 6.29 8.81 12. 17.06 24.32 30.88 
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LONG-CHAIN POLYMERS AND SELF-AVOIDING 
RANDOM WALKS II 


By J. M. HAMMERSLEY 
University of Oxford 


SUMMARY. This ecatinuation of a previous paper discusses the dependence of the connective 


constant upon dimensionality. 

gave the background and notation for self-avoiding Polya 
t continuation with equations and theorems numbered 
are in fact references to this previous 


Hammersley (1963) 
walks. This paper is a direc 
serially (so that references to equations (1) to (62) 
paper). 

We define 
A(d) = e“, a (63) 


where K is the connective constant introduced in (6); and we study the dependence 


of A upon d. 
ize gur concern with d, we write FW for the class of all self-avoiding . 
s on a d-dimensional lattice, starting from the origin. We write 
distinct members of /'(?; and we define f{ = 1. We introduce 


To emphas 
n-stepped Pólya walk: 
f( for the number of 
the generating function 


g(x) = E fo”, |x] < 1/8). (64) 
n=0 


Tn fact, it follows from (6) and (63) that 1/A(d) is the radius of convergence of 9(z). 
<é<d. Any member of F@ can be uniquely 


Let 6 be an integer satisfying 1 < 
t vectors V, Va... Vn in d-dimensional 


represented as an ordered sequence of n uni 
e V; is the vector joining the (i—1)-th to the i-th point of the walk. Some 
t 


e in the subspace spanned by the first $ coordinate axes. 
Let Vi, Vis + Vim be the subsequence of all vectors with pai property. If and only 
TË Vi, Vis =» Vin is a member of FO, we say that the orema walk Yi Vase Vis 
a member of FU, This is a definition of the class FA, In this definition we 
adopt the convention that F®) is the empty set of vectors; and consequently a walk 
of PO belongs to poo if none of its vectors V; are parallel to any of the first ô coordi- 


We define 


space, wher 
of these vectors V; may li 


nate axes. 
so 
ge) = E SEa", (65) 
n=0 


with the usual gloss that fo? = 1. 
Theorem 11: 
g(a) > PO) = A=) (aje e) ve (86) 
11: Since F@ is a subclass of FP, we have fe KO: 
half of (66) is trivial. 
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Suppose, then, that V, V,...V, is a sequence of vesit representing a 
typical member of F^; and that Vi, Vi, ... Vi, (possibly empty) is the subsequence 
representing the derived member of FG). Then the interspersed sequences 

Vi Ke: Vee VATI VË VETE në Pinti Firgas Vg, = (80) 
consist of vectors in the space spanned by the last d—ë coordinate axes, and are 
cone uenty members of 2 Padi , I e fë 

4-1? i prë 


2—1 — n—im 
respectively. Some or all of the sequences in (67) may be empty (e.g. whenever 
tı 1 = 0) and the corresponding entry in (68) will then be ye. Conversely 
if we take arbitrary members of the classes (68) and use their representative sequences 
to intersperse the vectors of an arbitrary member of F®, we shall obtain a member 
of F-": and distinct arbitrary choices lead to distinct members of F, Hence 


= 8) fii- t- 69 

PESO RANER AD, n (89) 
where the summation is over all integers i, such that 

NE T E qe <i, K N se /(70) 


and over all m satisfying 0 <m <n. (When m = 0, the term in (69) is to be taken 
as ffë-5.) However, if we expand the right-hand side of (66) as a power series in 2, 
using (64), we find that the coefficient of z” is precisely the right-hand side of (69). 
This completes the proof of Theorem 11. 


Theorem 12: For all positive integers a and b 


Aa-+b) > Aa)--A(b). e (71) 
Proof of Theorem 12 : Writing 6 =a and d—é = b in (66) we have 
PHN) > pOf) Oo (e)). sew (12) 


From the left-hand side of (6), we have for any d 


pz) SË for DË (ete = [1—A(d)z] as (78) 
by virtue of (63). Thus 
Pa) > pa Alap oey] 

> LAG) ji —A(a)z/[1— rA 

= {1AA D). as MS) 
However, the radius of convergence of the right-hand side of (74) is JA(a)--A(6)17: 
and this must be at least as great as the radius of convergence of the left-hand side 
of (74), namely 1/A(a-+b). This proves (71). 

Theorem 13: The function y(x), defined by 


vie fx), 2 Aa) 
Vale) — { 


(75) 
= Ma), 
satisfies the functional inequality i 9 
arle) < palhe), 2 > Aa--d), sea, (78) 
where a and b are positive integers. The derivative of r(x) exists and satisfies 
0 < Wille) <1, x> Aa). a (Th) 
Proof of Theorem 13 : 


By substituting (75) into (72), we obtain (76). To 


prove (77), we first note that the coefficients in the expansion of 9(a) are all 
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ke 


PO SS ee QË 


_ for any č > 1 and 80 


/ 
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positive integers; so that 9(x) is a positive strictly increasing function for 0 < 
t  VJA(a). Thus y,(x) is also a positive strictly increasing function for 2 > Ata). 
Within its circle of convergence 9 (x) is analytic; and hence ¥,(a) is differentiable 
for v > A(a). Finally 


0 < Val) = > poe pe z E 1 ve (78) 


Poe V w) fS VEJË S KOT 


which establishes (77). 
Theorem 14: For any pair of positive integers a and d, 
Mad) 


dx 
>d-l. 
es 2—Yale) sa (79) 
In the particular case when a = 1, the inequality (79) yields 
A(d) > (2d—1)—log (2d— 1). ni (80) 


which establishes (5). 
Proof of Theorem 14 : From (73), we see that 9“(x)—00 as w—1/A(a) from 


below. Hence, by (75), Ya(«) is continuous at v Ma). By integrating (78) from 


x = Ma) to x = y, we get 


Vral)— YA) < yA) : pe a(S) 
and hence, by (75), y—Valy) > Ma) > 0. ie (82) 


; = dy ; 
Therefore the integral I(x) = J TERO «x > Aa), (83) 
positive integrand; and therefore I(x) has a positive deri- 


has a bounded continuous 
ing function of æ. Since yaly) > 0, we have 


vative and is a strictly increas 
x 
I(x) > dY _ 50 as %—00; (84) 
Ma) 
so I(w) increases sti to co as x goes from A(a) to co. 
Consequently there exists a uniquely defined inverse function (č) such that 
= ër dy z 
ce 5) y—Val¥) 
Since Z(x) has a positive derivative, y(E) must be differentiable and have a positive 
ES ‘ation of (85) yields 
deriva . and differentiation © 
ss (E) = MË—VIXË). sa (86) 
Hence the right-hand side of (86) is differentiable in view 
xistence of XT č). Thus the left-hand side of (86) is differen- 
of Theorem 13 and the të E KË ësh eni r 


: ; tly MË 7 
tiable. Consequently X yE) = vE — valé): ne (87) 
The mean-value theorem now gives, 


rictly from 0 


20... we (85) 


From (84), (8) > Xa). 


by virtue of (77). 
tisfying 0 < 0 < 1, 

(E-) = ME) EHRE > KËNË) = VAXËN (88) 
el) 


s from (86). 


positive 


and this is strictly 
me @ sa 


a i 4 step result 
Saget wee 7 yntnadiction; Jet us now suppose that 
For the sake 0 y(d—1) > X(ad). ve (89) 
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We then have, by repeated use of (76) and (88) alternately, 
0 = YraalA(ad)] < Vralx(d— DI < Yraca—vlPalx(d— DI 
< Vraanix(d— 2)1 < Vaa-alYalx(d—2)] 


< Vata xA—3)) K --- < VdxON = Wal A(@)] 0. ne (90) 
This contradiction shows that (89) is false. Hence 
x(d—1) < A(ad): i (91) 
and therefore, from (85), 
x(d-1) 7 Mad) dy çi 
id I, vv) ~ p FA 


This proves (79). By substituting (64) and (75) into (79), we find that (79) is equi- 
valent to 
Mad) 


W q— 
fi+[ 5 S| jë L. a (93) 
Ma) n=1 © j v 
Let us now approximate the integral in (93). We have 
fo 2a, fP = 2a(2a—1) ve (94) 
and 2a(2a—1)? < fP < 2a(2a—1)"4, n> 3. (95) 
ioe) 
Za , 2a(2a—1) , 2a(2a—1)? fP (2a, 2a(2a—1) 
H + us we (96 
pe FEL a  ' æ(æ—l1) SË r z QI ma esh (9) 
A little manipulation of (96) leads to Š 
ao 
“il, (a—1)(2a—1)? f @ 4-1 a+1 97 
2a ' ala®+-2(a—1ja+ 2a—l2a—])] ? +12, a" | aa OU ae 
n= 


Since a walk whose steps are always in the positive directions of the coordinate axes 

is necessarily self-avoiding, we have {@ Za” with the consequence that A(a) > 4. 

Hence, by (97) 
Mad) 


J 


ao 
0 f 1-2) dæ p A(ad)--log A(ad)—A(a)—log À 
E Ha a" ] } x { eens 5 eos a 
Mad) i ( iN m a 
* SIE mk de. 1 9 
ST ala” a— Ne Xa— N24—D)) < 4a(a—1) | a 2(1——). (98) 
Thus (93) yields ‘ 
Aad) +log Aad) > A(a)-+log A(a)-+2ad—2a—4(a—1). (99) 


When a = 1, we have A(1) = 1; and (99) gives 
AA) Hlog Ad) > 2d—1 > [(24—1)—log (24—1)]+log [(24—1)—log(2d—1)]. ... (100) 
The left-hand side of (100) is an increasing function of A(d); and (80) follows. 


8 Tt is somewhat disappointing that the case a > 1 does not lead to a stronger 
version of (80). It follows from (98) that (93) 


can never yield anything stronger than 
l Alad)+log A(ad) > A(a)+log A(a)—2ad—2a > 2ad—1, . (101) 
on using (80) on the right-hand side of (101); 


at 8 ; and, of course, the left-hand side of (101) 
may no l 8 true. If indeed (80) can be strengthened, then presumably inequalities 
such as (66) and (88) are capable of sharpening. 
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DENSITY IN THE LIGHT OF PROBABILITY THEORY-III 


By E. M. PAUL 
Indian Statistical Institute 


SUMMARY. Using magnification methods, we prove the Erdés-Wintner theorem on additive 
avithmetical functions having distributions, and obtain some generalizations of it. it is also shown that 
if each of a finite collection of additive functions has a distribution, they have a joint distribution in the 
sense of logarithmic density. 


1. NoTATION AND TERMINOLOGY 


In this paper, we shall use the specific space X and the measure P in it that 
previous two papers (Paul, 1962a and 1962b), in connection with logari- 
, will stand for the prime numbers. If A is any set of real 


the sum of the reciprocals of the positive integers in A, 
of the form 


were used in the 
thmic density. q = 2, lə» - 
numbers, (4) will stand fo 
For each positive integer ”, A,, will stand for the set of positive integers 


qit... gj”, the exponents being non-negative integers. If S is any set of positive 

integers, 

lim inf MEDS» and lim sup LERA 

will be called respectively the lower and upper -densities of S. MA, is of course 

equal to re se (1-z) g If S is any set of positive integers, 
dı Qa n 


nes NS ana — ALANAS 
He BL 


lim Me, ti lim ne, BI 
B _y00 mia ae 


the lower and upper strong logarithmic densities of S; they will 
In this paper §, 6, À and A stand respectively for upper 
arithmic densities. It is well 


are called respectively 
be denoted by X (8) and A" (5). 
tural, upper logarithmic and lower log 


natur: E 
ural, lower na shatfor every tet 5, 


known (Tsuji, p. 121, for example) 
SEN KAZASH SË. 

let J consisting of Ja < ja < «+ be a sequence of positive integers. A 

së Pi integors will be said to be right-complete with respect to J in case 

set S of positiv E + th pë y 

for every positive integer 7 and arbitrary ™ such that jra < m < jr di’ dn" € 8 

rt) qer e 8; here the a”s are arbitrary non-negative 


i 0. Tm 
implies that qd Qe 1) | F 
s na This definition is in accordance with that in the 


: a - is any integer > Jr x 
a po e së recente in the first paper (Paul, 1962a). Instead of the 
abstract theor, g 


v. 
» Ki Le St. 
integers of the form g" që”... q,” ; inplace 
abstract space Xi, we now have the set of intege 1 Le PA place of 
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Dje az. 
+1) ja . ii 
X, we now have the set of integers of the form hr M a, , and so on: the expo 
1 3 


nents are arbitrary non-negative integers. In each space «,,, the 0-element will be the 


number 1 = ¢. ie PA . We can now talk of the basic vectors of a set of positive 
Jon-1y t+ 1 im 


integers that is right-complete with respect to J. It will be recalled that we adopted 
the same procedure in connection with the generalized magnification theorem (Paul, 
1962b), I9(S) and MES) have been defined there, for an arbitrary set S. 


Now let S be right-complete with respect to J. A basic vector (Xy, -> Ums 
0, 0,...), with x,, as the last positive coordinate, will be said to arise at the r-th stage 


rt ek i 
in case je- +1 < m <j, Any number of the form dj wg UTA e g where 


oT 
Uj pyres Ty are arbitrary non-negative integers and k > (j,+-1) is a member of S; 
5 tË N 
it will be said to arise at the r-th stage. We shall say that the number PA KA enters 


S through the basic vector (£y, ..., tm 0,0,...). The set of members of S that arise 
at the r-th stage will be denoted by S, We shall denote M,(S,) by D,. 


2. SOME CONVERGENCE THEOREMS 


Lemma 1: If J is arbitrary and S is a set of positive integers right-complete 
-with respect to J, P{Mj(S)} = AS). 


Proof: Let us put A= AS) and 7 = P{M,(8)}. Evidently A > r. 


Take anye>0. Take a positive integer 7 such that 


(1-5) 1—5) Sa ( t= greg for all s >r wx 12) 
and P(D,U ... U D) > r—e. se (8) 


For (2), we use Merten’s theorem. 


oe yy tim My GASP 


Tim HCG, LN 5} (4) 
ty. log — log g, > 
Jt Jr 
Clearly NSA 
SA 
Till the end of the proof, L will stand for the interval 
Cn . 
1, Sat 
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Now choose a positive integer m such that the following four conditions hold. 


HLS) i 

DT N +e sa (6) 

» HL), E oe Ë 

log Tiersen) ne pë (7) 
KLOS) $ v—e PrE 
ML) ae (8) 

KUNSU-USË < Te (8) 

ML) 


where T = P(D, U ... U D,). 
There are infinitely many values of m satisfying (6): from among them, vve 
that (7), (8) and (9) hold. For (9), we recall that (S$; U ... US;) 


choose an m so large 
al to P(D,U ... UD,). 


has natural and hence strong logarithmic density equ 


MIS—S U -- USHNË  v—r—ze, 0) 
ML) 


by (8) and (9). 7 1 
—4\(1—1/8) ...( 1— 
Again, P(D;+1 UDz 42 U «+ UD, +m) > (1 I 13) ( Term 


xHIS—(SU --- USHIN B- ke SË 


Tn order to see this, we note that 
{8—(SiU ss US,)} N LC Sra U see ISen nja (12) 
Now let ( x 0, 0, ...) be a basic vector arising at the (7--1)-th stage. Sum 
ow let (a, Vas ee Uy? 2” 


all positive integers S Gleam) and entering S through this basic vector 
a rt 


of reciprocals of 
is i 


< : KY i 
E ting ED Kim 
the v's taking arbitrary non-negative integral m 
; EP a 1 ( 1— l 
2... ËN (1 eati i Gjer 


{p-measure of the cylinder set whose base is the point 
j 1 
<<a 


See 


(Wy, ne Piety 


dj rim) 


Thus 


(1—4) ef i= ) XAS N (0, Derm? < P(D,41): 


Gj, rem) 
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Similarly (1—3)... e ) XM Sirsa N (0, Gj: I} S P(Dria)s 


Germ) 


and so on. Adding, we get 


B. 
P(Drirt.::+Drin) > (1-8)... (1——— ) xHSh U USH N D 


er) 
An application of step (12) now gives step (11). From (11) we get 
1 


ti L (13) 
P(Dy 44 U... U Dm) > (A —T—2¢). M ), 2108 Taam 
by (2) and (10). 
Nov N—d)—r < KS _ pip, ... U Dray) by (8), 
HL) 
1 13), 
< X+e—| THA — T26). MI) Sie by (6) and (13) 


1 by (7) 
“pe—( T+ (N—T—2e)(1—e) log q; am) T 
< Ne { 7 5 log Dons ¢ X dr. mi 


< AV +e—{P+ H(A’ —T—26)(1-e)} 
< N+e—{r—€+ HA'—r—2e)(1—e)) by (3). 
From this inequality, we get by elementary algebraical calculations 
e(8—2e) , 
l—e ` 


N— S 


—2 
A—TK =. 


—e 
Making e—>0, we get A Sri 


Corollary: Let J be arbitrary. Let S be a set of positive integers that is right- 
complete with respect to J. Then 
A(S) = lim “0, Yin INS} 

= ie log Yin 
Proof: The last expression =X’, In 


(through 7); but the definition of A’ 
use the inequality 


(4), A’ appears as dependent on € 
shows that it is really independent of e. Now we 


1, — (8—2e) 
Mer g 
Lemma 2: If J is arbitrary, and S is any set of positive integers, 


PI (9) > NS). 
Proof: The proof is similar 


to that of the magnification theorem (see Paul, 
1962a, p. 108). As in Paul (1962b), 


let us call the 
Kies X, space Fi; Zini zai X, -space Yy 
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and so on. The space X is now regarded as the space Y, Y,.... Let y, be the point 


(to v; ) of Yh. By v(yı) we shall mean the number 24 3% ... gji . Similarly if 
1 


Ya is the point (w, +1 » = %,) of ya, (Yq) will stand for the number q +D Pe që -and 
i Gati) je? 


so on. 
Let A, be the set of vectors (Vi, Ya --- ) such that yz Æ (0, 0, ...,0) and v(y,) 


Vo)... v(yp) 68. “Then lim A, C M7 (8) C lim A, U (a subset of Z); we recall that 7 
pa n 


is the set of vectors containing only a finite number of positive coordinates 


So PUM” (9% = Piim 4,}. 
Since P is countably additive, is sufficient if we prove that for an arbitrary k, P(A; 
> NS). AU Ani U. is the J -magnification of a certain right- 


U Anya U) > 
complete (with respect to J) set T, of positive integers. TU (the set Tj,-1, of positive 


integers all of whose prime factors are from among 2,3..., jam) D'S. Hence 


Pia: U Ara U...) PË = APs), by Lemma 1 = APs U Hav} > 4(5). 


Theorem 1: Let f(n) bea finite real-valued function defined on the set of positive 
integers. Let K— {kis ka... be an increasing sequence of positive integers. Let 
{C,}, ne K , be a sequence of real numbers such that f(2™ ... q ,”) + Cn converges with proba- 
bility 1 to a random variable g(a), as NWO through the numbers in K. 


Then if f has a distribution in the sense of logarithmic density, the sequence On 
converges to a finite limit C and the distribution of f is the same as that of g(x)—C. 
Let Q denote the distribution of f(n). The sequence {0n} must be 
} is unbounded above. Take a small number say 0.01. 
P,{g(«) > d} < 0.01; also let P,{g(z) = d} = 0. 
Q(d') = 0. Since fc) is unbounded above, we 


Proof : 
pounded. For, suppose {Cn 


Let d be a number so large that 
< 0.02, 


Let d’ h that Q(— %9, d') 
e f£ ne K such that 


have infinitely many values 0 
PAJ- G") < a} > 0.99. 


c density < 0.02. Now if a set S has logarithmic 


ef as 


Also, B{f(n) < dy has logarithmi 
n 
-th II-density ratio = MA ust 


for sufficiently large n, its n 
0 is arbitrarily small) < 0.92; here we use Merten’s theorem. 


density < 0.02, then 


be < 0.02-(e7—1)-telë > 
(An DË where S = Bf (n) < 


But “I d} = Pf... 6") < d3 > 0.99. So we 
~ HAy) 
have Law Similarly {C,} is bounded below. 
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Row let b be a limit point of (01 let J be a subsequence of K such that C„,—> b 
when n—co through J. Then f(2” ... Tn) g(@)—b as næ through J. Let « 
be a number such that P,{g(x)—b = a} = 0 and Qa) = 0. Then 
Ble) baj C MHE: fto) <a} and Bfgle)—b > a} C MHE : fin) > 2}. 


By Lemma 2 Prla(e)—d < a} < ARË fin) < aj 


and P,{g(x)—b > a} < ABE fin) > a}. 
Now we use the hypothesis that f has a distribution in the sense of logarithmic density 
and deduce that Q is the distribution of g(x)—b. 


It follows at once that the sequence (C,) has only one limit point. 


3. ADDITIVE ARITHMETICAL FUNOTIONS 


A finite real-valued function f defined on the set of positive integers is said to 
be additive in case of Fon) = f(m)+f(n) whenever m and n are mutually gm 
Erdös (1938) proved that the following conditions (we shall refer to them as the EW- 
conditions) are sufficient to ensure that f has a distribution : 


2 AA) and = U (qn)? 
n In n n 


I converge, 
hi pla { Kan) if |f(q,)| <1 
ere a) = 

f 1 if |f(q,)| 1. 


Subsequently, Erdës and Vintner (1939 
if fis to have a distribution. Kae ( 
the EW-conditions and Kolmogor 
Magnification theory enables us t 
First we prove the Lemma : 


) proved that the EW-conditions are necessary 
1949) has pointed out the close analogy between 
ov’s three series theorem (Doob, 1953, Pp. 111); 
9 push this analogy into logical deducibility. 
Lemma 3: The infinite series X S(G") 


converges with probability 1 if and only 
if the LW-conditions hold. 


Proof: Let B be the “box” 


(0, 1)X(0, DË vo BE: We introduce an 
auxiliary additive arithmetical functio 


n g(x) by putting 
Gn) S Fan, & = 1,2,3,.... 
eries theorem, validity of the ZW-conditions implies the con 
vergence with Probability 1 of the series Y Kg). 
n 


By Kolmogorov’s three si 


v 
It follows that the series uf (1.7) 


converges almost everywhere on B. Now P(B) = p—s > 0. So by the zero-one 
n In 


law (Doob, 1953, P. 102), Y KË converges with probability 1. The converse is 
n 


by employing the converse part of the three series theorem. 
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If the EW-conditions hold, the sequence f(2™... q””) converges almost every- 
where and hence by Theorem 2 of Paul (1962a, p. 109) f has a distribution in the sense 
of logarithmic density. This is weaker than the theorem of Erdés who proved that if 
the EW-conditions hold, f has a distribution in the sense of natural density. 

Theorem 2: Let f(n) be an additive arithmetical function having a distribution 
in the sense of logarithmic density. Then the EW-conditions hold. 

Proof: We use the following theorem (see Doob, 1953, p. 121). 


Let Y1 Ya, ..., be mutually independent random variables and suppose that 


for some K > 0 
lim sup P{| = y| < K}> 0. 
nc 1 


Then there is a sequence {d,} of numbers such that 3 (yytd,,) converges with probabi- 


lity 1. 
Our infinite series f(2”) + f(3")+... satisfies these conditions. In fact, 


let Q be the distribution of the function f(n). Let a, P be continuity points of Q 
such that Q(e, #)=p> 0. Since A = Efn : f(n) ea, A} has logarithmic density 


P. u (4) > 2 (Y being Buler’s constant). Hence for all sufficiently large n, 
3 e 
P, {f(2 UA ela, hy} > 2 => 0; 


Tt follows at once from Theorem 1 that the series f(2"')+f(3")+ ... converges with 


probability 1; hence the EW-conditions hold. 
This theorem was proved by Erdës and Wintner; their theorem is slightly 
weaker in that they require that f(n) have a distribution in the sense of natural density. 


As a corollary to our theorem we have the result : 


If an additive arithmetical function has a distribution in the sense of logarith- 
it has a distribution in the sense of natural density. 


mic density, 
ee conditions hold; hence by the theorem of 


Proof: By Theorem 2, the EW- 

Erdés, we get the result. | 
We can easily generalize the main result above to a class of weakly additive 
arithmetical functions. Let J consisting of jy < je <. be arbitrary. We shall call 
a positive integer greater than 1 a pseudo-prime in case all its prime factors are from 
among the same block 45,41)» Uj-+2)? °°? Gjen (for some 7). The pseudo-primes of 


PE, T will be said to constitute the (r+-1)-th class of pseudo-primes. 
e form d- 13)“ G 
(je +HDES “Jeto 
i: . SDr tation as a product of pseudo-primes 
itive integer has a unique represen 
Every posi (one pseudo-prime at most from each class). We shall rofer 


from different classes l 
to this representation as the canonical representation of the number. Two positive 
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integers m and n will be said to be mutually pseudo-prime in case the class contri- 
buting pseudo-prime factors in the canonical representation of m and the correspond- 
ing class associated with are disjoint. 


Let f(z) be an arithmetical function that is weakly additive in the sense that 
f(mn) = f(m)+-f(n) whenever m and n are mutually pseudo-prime. Exactly as before, 
we can prove that if f has a distribution in the sense of logarithmic density, the series 
x [ PG)... Han 
r f Tat) Riess 
will converge with probability 1. 
Tog Goss is bounded, this condition is also sufficient to ensure that f has 
n 
a distribution in the sense of logarithmic density. Here we use the generalized magni- 
fication theorem (Paul, 1962b). 


We recall that a finite collection of arithmetical functions fim), i= 1, ..., 7, 
will be said to have a joint distribution in case there is a probability distribution Q 
in r-dimensional space such that for every open set A in r-space with Q{Bd(A)} = 0, 
EAN), ..., f(n)) € A} has density Q(A); here BdA stands for the boundary of A. 

Theorem 3: Jf f,(n), ..., fin) are additive arithmetical functions and each has 
a distribution (in the sense of logarithmic and hence of natural density), then they have 
a joint distribution in the sense of logarithmic density. 


Proof: Each of the series X HA) converges with probability 1. From 


this we can deduce our theorem using the magnification inequality PIM WSJ} < AS), 


exactly as we proved Theorem 2 of Paul (1962a, p. 109). Similarly, we have 
f log jni ; 

Theorem 4: Let J be such that is bounded. Let fi (n), ..., f, (n) be 
weakly additive arithmetical functions and let each of these r functions have a distribution 
in the sense of logarithmic density. Then they have a joint distribution in the sense of 
logarithmic density. 
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SOME FIRST PASSAGE PROBLEMS AND THEIR 
APPLICATION TO QUEUES* 


By N. U. PRABHU and U. NARAYAN BHAT 
The University of Western Australia 


SUMMARY. In this paper we study some first passage problems concerning the process i +Sn—n 
(n > 0), where Sn is the sequence of partial sums of independent and identical random variables. The 
results are applied to Markov chains imbedded in tho queueing process GI IMJI ana M laj. 


1. INTRODUOTION 


Let {X,} (n = 0, 1, 2, wy: be a sequence of mutually independent and identical 
random variables which assume non-negative integral values, and Sy = Xp PX ew 


HA, (n = 1, 2,...) be the partial sums of {Xn}. - We define the random variables 


T; = min {n|S8,—-n < =j, U; = min {n| S—n >j} (1.1) 


N = min {n| S,—n = 0} (1.2) 
where 7 > 0, j > 0; clearly, T; is the first passage time to a distance ¿ to the left, and 
U; the first passage time to a distance j to the right while N is the recurrence time of 
0. In this paper we establish some results concerning these random variables. These 
results are then applied to the queueing systems GZ|M|1 and M |@]1, both of which 
have a single server and the queue discipline “first come, first served”, In the first 
system the inter-arrival times have the distribution dF(t) (0 < t< co), and the service 
time has the negative exponential distribution pet dO < t< co): let Q, denote 
the queue length just before the arrival of the n-th customer in this system. In 
the second system, the arrivals are at random, i.e. the inter-arrival times have the distri- 
bution ze—“tdt, while the service time has the distribution dF (t); here let Qn denote the 
queue length just before the departure of the n-th customer. For a detailed daja: 
tion of the tyvo systems, and references see Kendall (1951, 1953). The process (Q,) 
(n = 0, 1, 2, ...) is in both cases a Markov chain; earlier discussion of these chains vo 
confined mostly to their ergodic behaviour, but during recent years Takács (1960, 1961) 
Finch (1960), and others have investigated their transient arte, era 
the methods used so far are sometimes difficult, and moreover, in many cases, the ve ke > 
have not been obtained explicitly. In this Paper we deduce all the > sults 


important pro perti 
of {@,} from those of the random variables 7',, Upand N. dai 


* A summary of the results of this paper was presente 


d at the Second Su: 
of the Australian Mathematical Society held at Canberra in J any 


er R Ni é 
‘anuary 1962. esearch Institute 
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2. DISTRIBUTIONS OF T; AND U, 


Let i 

: k; (j = 0, 1, 2,... 
Prix, —j—4 “ ne (2.1) 

0 otherwise; 
Ke) = È kala) <1); a = B(X,) < co. ve (2.2) 
0 
Further, let 

KO = Pr(S,—jj (n>), KO—O (Gg0), MM=1; se: (2.9) 
KP = Pr (Sh <i} as”) = Pr {§, > j} of) dy. ne (2.4) 


We have then the following theorem. 


Theorem 1: For i > 1 we have 


(a) gli, n) = Pr {T; = n} = Pr fi-S,—r > 0 (r = 1,2,..., n—1), i+8,—n = 0} 


0 if ngi 
LE mit s (2.5) 
n 
L facil 
(b) Pr (7, < of} — X gli, n) = { (2.6) 
n & fasi 
where € is the positive root of the equation z = K(z); and 
(e) HL) =i1-a) if w<1,=0 if al, ve (2.7) 


If we interpret X,, as the amount of water which flows into a dam during a 
unit time interval, and suppose that, as in Moran’s (1959) storage model, a unit 
of water is released from the dam at the end.of each s 
is empty, then 7’, is the duration of the “wet period” 
with initial content i to dry up. The ab 
by Kendall (1957) for the analogous prob 


amotnt 
uch interval, unless the dam 
, Le. the time taken by the dam 
ove results are a restatement of those derived 


lem in continuous time; however, for the sake 
of completeness the main outlines of the proof are given below. Since i+8,—r > 0 


forr <i we must have T, >i; moreover, since we can write t+8,—n = 
S;+8,-;—(n—1) (n > 1), it follows that 

T =i+T7(8)). (2.8) 
Using (2.8) we obtain a difference equation for g(t, n), which yields the solution (2.5) 
[see also Gani (1958) and Takács (1961)]. Further, it is obvious that PIT, < co} 


is of the form ¢, so that (2.8) gives 
C= È KP Pr (7, < o} = [KY] 
0 


(2.9) 
which leads to (b). Finally, using ET) = iE(T,) 


» we obtain from (2.8), 
BD, = HË GET) = Ta ET) (2.10) 
which gives (c). 
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Theorem 2: For i> 1 we have 


Prf+sS,-—r>0 (r= 1, 2, 2-1); t+8,—n = j} 


nal = 
= kiji E gli, n—m) KO, m (i) 
mel 
For, we have 
We) i = Prfi+S,—n = j} = Pr {T; > n, t+8,—n = j} 
HET Sn: i+8,—n =j} ve (2.12) 


The first term in the last expression is the required probability, 


while the second term 
can be written as 


T Pr (7: = m} Pr ti HS,—n = j| T; = m} 
m=1 
=F gë, m) Pr {i+ S,—n =jli+S,—m = 0} 
m= 


n—l 
= X gli, m) km; vv (2.18) 
=1 
which proves (2.11). 


We observe that when j = 0 the expression (2.11) reduces to ad 


i, n), 
) P aer 
sioe kiti = gli, n)+ X gli, n—m) km, (2.14) 
while, when j < 0, (2.11) vanishes identically, as it should, 
3 AL (n) 
since e E Zam n—m) kaè, (v> 0). (2.15) 


The identities (2.14) and (2.15) can be easily proved, Thus Theorem 2 


is a generali- 
zation of (2.5). 


Theorem 3: We have 
(a) Pr{U= 1) =a 


Pr {Up = 0+1} = È a gli, n) 
Pe 


1 
(b) Pr (U, < aj 
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For, we have 
Pr {Uy = 1) = Pr (Sj—l >} = do (2.18) 
and for n > 1, ; 
Pr {Uo = n-+1} = Pr (S.-r <0 6 = l, 2,..., n), Snii—(n+1) a 0} 
= È PriS,—r <0 (r= 1,2,..., 2-1), S,—n— —i} 
i=1 
x Pr{Spa—(n-+1) > 0 | S, n=} 
= £ Pr {8,_,—(n—1) = S,—n—(S,—1) > —i(r = 1, 2, ...,n—1), S,—n = — i} 
i=1 
xPr {Snia—(n+1)—(S,—2-+4) > 0} 
oe 
= DaPr i+S8,—r > 0 (r = 1, 2, ..., n—1), it 8,—n = OF vee (2.19) 
il 


and the result now follows from (2.5). Further, we have 


Pr (U, < o} = È Pr {U = nj = at È ay Pr (7, < co) 
0 i=l 


| $ a= a e fasi 

0 

as wee (2,20) 
2. 1—K(é) ; 

pas Ti farzi, 

using (2.6). 

j Theorem 4: For j > 1 we have 


(a) Pr (U, = = qy 


- sa n-1 
Pr (U, = n+1} “2 Ay dy 2 alë, n—m) ke} (n > 1) | (2.21) 


1 if a>1 


(b) Pr {U, =< Obi > 
j } | (l—a) > kel if acl. (2.22) 


The proof of (a) is similar to that of (2.16), but we have now to apply (2.11). 
To prove (b) we note that j 


Pr{Uj <a} ot È È a E, E E akeg, n—n) 


ntj-i 
nel i=l j- n=1 i=l m= 


SË, 2 ali a = HRP; Pr {Uy < co). vee (2.28) 


This last expression is the coefficient of zë in the formal expansion of 
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By a familiar type of argument [see for instance, Prabhu (1960)], it can be proved 
that |K(z)| < |z| im the region 1 < |z| < ¢if æ < 1 and in ¢ < j2| = Tate a5 
moreover È ayzi = [1—K(z)] (1—z)“ in both cases, if we assume that K(z) is capable 
of an analytic extension to the region 1 < |z| < ¢ in the case «<1. Thus the 
expression (2.24) reduces to 


= (€<|z|<1, ay 1) 


ae vee (2.25 
at) o (Sk SË). 


Expanding the functions in the regions of their validity we find from (2.25) that the 


' coefficient of zł is as given in (2.22). 


We also have 


Pr {U;= n+ B= zZ PrS.—l = j—i, S—r <j (r= 2,3,...,2), Salnt) >} 
pa 


= $ kesaj Pr (S,—7 <j (r = 2, 3, ..., 7), 
=1 
Sna (n+) ZJISHI = ji 


SË hj PU; =n} (a> Lj >o) > (2,28) 
i=l 


Adding up (2.26) over n = 1, 2, ... we obtain 
PRIU, << of = Z kişi PIU, < o}+a; (j> 0) 
i= 


whence we get the result 


Kte) 
Kla)—z 


È zi Pr(U, = co) = Pr{Uy = co), |z] <1. ve (2.27) 
0 


3. THE MARKOV CHAIN Z, = i+S,—n 


It is clear that the process Za =i+S8,—n, Z = i, is a Markov chain, but we 


have not made use of any special property of this chain in deriving the results of 
The transition probabilities of {Z,} are given by 


ë p ae, oo Sc. 
Py = Pr (Z, = |Zo = Ñ = Mii (n > 1; dj =+-—1, 0,1, Bl on (3.1) 


PY =k, =0 GF), =1 Gi) «a 


Section 2. 


The random variable N defined by (1.2) vi the recurrence time of the state 0 of the chain 
{Z,}. The following results = implicit in the more general results derived by Kemper- 
man (1961) using complex variable methods, but in their explicit form are a lone 
of the results,of Section 2. 
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Theorem 5: Let F{) = Pr(N = n}; we have then 


(a) FQ =k 
FG =F ik, gli—l,n) (n> 1). e (8.3) 
ie 
fa if a<l 
b) Foy = Prit < co) = i i < (84)” 
BO<al sf gor 
For, FR = PriS,—1 = 0} = kj, 


while for n > 1 we have 
FG) = PriS,—r £0 (r = 1,2,...,n), Say — (NH1) = 0} 
=Pr{S,—r>0 (r= 1,2, 1.. 2), Shyy—(n--1) = 0} 
+Pr{S,—r < 0 (r = 1, 2, ,.., 2), Snii—(n-+1) = 0} 
-17 4 z 
+25 PriS—r KO (r—1,8,..., m 
m=1 


4 


820 (r = M41, yn), Saint) = 0)... (8.5) 


Sham (mn 1)— Snir —(n+1—r) it is clear that the first term on the 
right hand side of (3.5) is equal to the second; the latter, however, is 


Since S,—r = 


=Ë Pr{S,—r < 0 (r = 1,2,...,n—1), S,—n = —i, Syj—(n--1) = 0) 
al 
= Ë Prtit8,,—(n—r) > 0 (r= 1,2,..., n—1), i+ S,—n— 0}. 

i=1 


Pr{S,41—(n+1) = 0 | s,—n =—}} =È kizı gli, n). (3.6) 


The remaining term on the right hand side of (3.5) is 


nl 0 1 
=z & Pris,—r KO(r—1,2,.. 


ZË om), Sa MD) i}, 
PriS,—r > 0 (r = m+2, sy), Shyj—(n--1) = 0| Sma — (m1) = i} 
n-i œ w j 
= Zoli, teti X Ene, —e <0(r=1, 2,..., m),S,,—m ST Smyi—(m+1) = i}. 
ni 0 të ao 3 nn 
ao = gli, n—m) 2 kja g(j, m) ee = kisin G(G+i.n) ve (3.7) 
SË (v—2)k-gv—1, n) (3.8) 
where in (3.7) we have used the obvious property T+T; = Tij. From (3.6) and 
(3.8) we obtain (3.3) for n >1. Further, | á 
D 
we have Fo Ë FE = h4 È ik; Pr{(T; < co) 
a TI ogi 
=v 3 aren 
Bike HSK(Q<1 if as I, ae) 
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From (3.4) we see that the Markov chain {Z {Zn} is ai except when g = 1; 
moreover, the mean recurrence time of 0 is EW) = 14$ ik, E(T, 4) = co when 
4 = 1, so that in this case the chain is persistent null. It follows from (3.1) that the 
series $ ko, converges except when «= 1. More specifically we have the following 


n=0 


theorem. 
Theorem 6: If « 41 we have 


E f Qa) f asi 
(a) = kë > 0), =a. (3.10) 
nse USK (Qi af er 
ms (1—a) ty oi 
(b) = kO i (j > 0). ALI) 
L USE HO fa>1 
Incidentally, Pr{U;< of<1 if ac. 
From the familiar identity F; = 2 POR PO (as 1) owe (3502) 


we obtain the following for special values of i, JË 
(i) Let i= 0,j = 0; then $ P§ = (—Foo) and using (3.4) we obtain 
(a) with ¿i = 0. 
(ii) Let i > 0, j= 0; then Di PË = Fy Š kf”, and, since F. io = Pr(7, ; <00} 
we obtain (a) for i > 0. 3 $ 
(ili) Let i = 0, j > 0; then Di PO) = Fo TK < Fk” since Py <1 
0 0 
for a 1; this gives (b). 
4. THE QUEUE crjarjl 


Let to, ty, to, ... be the epochs of arrival of the Successive customers in the 
system, and Q, denote the queue length at ë = t,—0. Also, let X, be the number of 
departures during the interval (ta, t,,1—0) (n = 0, 1, 2, ++); clearly Xo, Aj... are 
mutually independent and identical random variables, with 


y= Pik = fen ary Goo). oar ry 
j i 
We have Ke) SË jë = Wipe), a= W{X,) = pr, = (4.2) 


where 7/(@) is the Laplace transform (L.T.) of dF(t) and pis the relative traffic intensity; 
further, 


n — f -u HY ap çë 
i) J E j! (0) (4.3) 
(co) i ao n 
: ny Sue 
orf") — f ent wee Fal Hdt, KP) = | ent jë Fadi, ... (4.4) 
0 
where F(t) is the n-fold TE of F(t) with itself, and Fyt) = 0 if ; <0 and 


=1ift20. 
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For the Markov chain {Q,} we have the recurrence relations 
Qat1—-Xy if Qnt+1—-X, >0 
Qa = 


x (4.5) 
0 if Q,+1-X, < 0, 


whence we obtain 
Q, = max{r—S, (r= 0,1, 2,....2—1), ++2—S,}, a (458) 


where we have denoted Q, =i. The transition probabilities of {Q,} are therefore 
given by 


Pr{Qn <j|Qo = i} = Prfr—S, < j (r = 0,1,...,n—1), i+n—S, < j} 


= 3 Pr {jE Sr SO = 0,1,...,n—1), f+ 5,—-n = 
ve tl 


= Èu ae = MG: n—m) kg from (2.11) 


I 


am ET e = Ma n—m) am 


(4.7) 
a result which has not been explicitly obtained before. In particular, 
n 
Pr {Qn < jlo = 0} = 1— Z gj, m). ` (4.8) 
mel 
We define the zero-avoiding transition probabilities PS) of {Q,} as follows : 
oP) — PrO; > 08 (r= 1, 2, ..., =I), Q, = j|Q = 3). (4.9) 
For j > 0 we have 
op) = Pr fi+r—S, > 0 (r = 1, 2, ..., n—1); i+n—8 = jt 
= Pr{(n—r)—8,-, <j (r= 1, 2,...,0—1), j+S,—n = i} 
2 : = Pr {j+8,—7 > 0 (r = Ly Buze n—1), j+S8,—n = i} 
f glj, n) if i=0 
= n—1 
ke), 5— a I); n—m) kn); if ‘> 0, jav (4.10) 
L mel 
while, for j = 0 we have 
opm — > haki E i . 
Pi) = Pr {fi+r— 8> 0 (7 = 1, 2, ven), i4+n—8, < 0} 
= Pri(8,—r gi (F—.1,2,..., n=l), Sy—n >i} 
=Pr{U;=n} 20 
ran (4.11) 
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In particular, "PR (zë = 1, 2, ...) is the distribution of the number of customers served 
during a busy period, and is therefore given by (2.16), which agrees with the result of 
Takacs (1961, p. 406). It follows that the chain (QA is persistent null if p = l, per- 
sistent non-null if p < 1, and transient if p > 1. The limiting queue length is given 
by. 


Qo = lim Q, = max (r—S,), ve (412) 
n—> 00 r>0 


when we obtain 
Pro <j) = eller (r— 8) <j} = Pr{j+8,—r > 0, r > 0) 
E 
0 pal 


= Pri7, = co) = { 3 E ax e (4.13) 
aS P > 


which gives the familiar result for the stationary distribution of the queue-length. 
5. THE QUEUE u|¢| 1 


Here, let tp, ti to, ... denote the epochs of successive departures, and Q, the 
queue length at time t = 4,0. Further, let X, be the number of arrivals during 
(tr+0, tyr) (n = 0, 1, ...); then Xo, Xj, Xp, ... are mutually independent and identical 
random variables, with the distribution (4.1), and « = E(X,) = p, the relative traffic 


intensity. 
Q—-14+X, if Q,>0 
We have Qua = 1 5.1 
n if Qn = 0, ( ) 
whence we obtain 
Qn = mhax{Xpa~tXpot-+Xn+—r+1 (l <r < n—1), t+8,—n}, 
where Q)=?. 
Since Kya FA pag HA pir — Sh— Snr — S, 
we can write 
Q, = max {8,—r+1 (r=1, 2, ..., n—1); i+8,—n}. sx (5.2) 
For the limiting queue length we have 
Qe Sn (S,—7r+1) ; sx (5.8) 
the limiting distribution of queue length is therefore given by 
PrQa <J} = Pr {max (S—r+1) <j} = PS —r < j, r > 1) 
r> 
= Pr{U; = co). se (64) 
Hence we obtain 
; 0 if p>1 
Pr{Qe = = E s (5.5) 
= te Nel = Gh eres 
g if p>1 
J bli zt ndër pe OO ET a ok 
and = Pr{Qe SH { I=) Mf poi + 68) 
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which is essentially the result proved by Finch (1960) by incomplete arguments. We 


also note that when p <1, we have the well-known result a 
on T Ka) 
x 2 P: <j} = (1— 1 cae VARI 
2A Pr{Qe <j} = ( rae PIS (5.7) 


as a consequence of (2.27); expanding (5.7) as a power series in z we obtain 
i 
Pro <j} =(1—p) È KP soo, (GS) 
n=0 


where k{-”) is the coefficient of 2/ in the expansion of K(z)-"; this alternative expression 
may be more useful in particular cases. 

The transition probabilities of {Q,} are given by 
PriQn < jlo =ù = Pr{S,—r+1 <j (r= 1, 2, ..., n—1), i+8,—n < J 


SË PiB,—r gj (—1,2,...,n—1), S,—n —j—v) 


oe 
= = Pr{ 8,_,—(n—r) >—v(r = 1, 2,...,n—1), S,—n = j—N 
vot 1 


= = Pry ti S,—r > O(r = 1, 2, ..., n—1), v+S8,—n = j}. ... (5.9) 


Hence, in particular when j = 0 we obtain 


PW = Pr{Q, = 0[Q) = i) = $ gv, n), (5.10) 


and for j > 0, 


° v 2 n-i 
Pr {Qn < 9 | Qo E i} — E i E gv, n—m) ko 


n-1 
= Kori X PË (5.11) 


[cf. Finch (1960), equation 16, and also in another context Yeo (1961). 


The present 
methods seem to be much more straightforward]. 


The zero-avoiding transition probabilities defined as in (4.9) are given by 


Pp — aoe > 0 (r= 1, 2, ...,n—1), i4+8,—n = j} (i > 1) 


Prlt+S,—r > 0 (r = 1,2,...,n—1), 14+-58,—-n = 7} (i = 0) ea) 
and these are given by (2.11), In particular, 
4 @ F, 
opa) — i,n)z © (et (ut) . 
® = gli, n= = f ajr O (n> i) e (5.13) 
gives the distribution of the number of customers served duri i 
rin; riod initi 
Era aring a busy period initiated 
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6. PARTICULAR CASES 


The queue D|M|1. Let F(t)=0 if ¿< AÀ, and =1 if ¢ >A. Then Kte) 
eG-ale, so that 


m= NY gate (ip) sm) NE (nie 
aj = Be es sjn) =Z e MR ee (6.1) 
Hence 
PriQ, <J1Q@=3= QË ere PË (nr pes). „= (6:2) 
where pene 
i ro in zj 
A= 4 n- A M ; Nur A @ --- (6.3) 
{2 mgmt) (n—j—mi” ifn >j.  K 


The queue E, |M\1. Let dF(t) = (KATHE eki pen qe j(k—1)1 (0<t< 00): 
then K(z) = ai(l— be) where b = (1+kp)“ and a = 1—b. 


La 42 ‘nk 
Hence ajo = D, (P e of.) E) arom, (6.0) 
r=j+1 
and 
PriQn < j |Q = të 
E —nkY So j (—jk—meN (—nk+jk+-mk 
= qik X cod ( Ë JI ( z ) ( Pees } (6.5) 
s=n-j+i+1 m= 


The queue M|D|1. Let the service time distribution be given by Ftë —0 
if ¿< À, and —l if t> AÀ. Then K(z) = e~/G-*), and hence 
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n) — YT VET (mp) 
PY = MAD ni (6.6) 
and 
Prò, KIIQOS 3 
ndj—i us a n-m+j m Swa 
E (no) —(n—m)p (NP—MP) Sh Lg 
EA D ET Dee (n—m-+j)! DI m(n—v)| (mp)™ 
0 =i 
n+j-i E 
=D e Ent Pis (8). sss (09) 
=o . . 
The limiting distribution of the queue length is given by 
i 
. —npyi—n 
Pr a SHE (=p) ze E (PS 1) ni (6.8) 
n=0 


Fef. Syski (1960), p. 325]. 

The queue M|E,|1. Let the service time distribution be given by 
AP) = (b>)! eH -1 dt|(k—1)! (0 <t < œ); then K(z) = që (1—bejt, PË 
a = k(k+p)* and b = 1—a, i 
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= —nk es Aue 
Hence PE) = 2 5 Ta ar” (—b)"-”, i (6.9) 
n+j—i 
—nk ; 
and PriQ. <Q =i} =a" > ( i ) (—b) 
s=0 
pa a vs [—nk—jk+sk—mk) eo 
i py Lees ire ni (6.10) 
vi a!) al m ) ( s—m 
e) m= 


The limiting distribution of queue length is given by 


= k A 
PriQe < Jj} = (1—p) Ž, që (ae (—bi” (p< 1). so. (CY 


The usual expression for the limiting distribution in this case is a weighted sum of 
k geometric terms, the common ratios and the weights being obtained from a certain 
characteristic equation [see Syski (1960), p. 321]; however, the result (6.11) appears 
to be much simpler. 


Putting k = 1 in (6.10) we obtain the transition probabilities of {Q, for the 
queue M|M]|1 as 


ndj—i 
Pri, < jl =} = 0 >? ( i (=) 


s=0 


ndj—i si 3 CL td ES 
E E Y n+j—s ( n—j+s—m m+j—s (618) 


n+j—s+m m s—m 


where a = (1--p) and b = 1—a (cf. Finch, 1960, equation (33)). 


ssj m=0 
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ON STABLE SEQUENCES OF EVENTS 
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SUMMARY. A sequence {An} of events is called a stable sequence if for every event B the ae 


lim _ P(A,B) = Q(B) exists. It is shown that in this case Q is a bounded measure which is absolutely 
n— +00 
continuous with respect to the underlying probability measure P. The Radon-Nikodym derivative ae =a 


is called the local density of the stable sequence {An}. Criteria for a sequence of events being stable are 
given, further examples of stable sequences are discussed The notion of a stable sequence of events 
generalizes the notion of a mixing sequence of events, introduced in a previous paper of the author. A 
stable sequence is mixing if its local density is constant almost everywhere. 


1. INTRODUCTION 


Let JQ, Z, P] be a probability space in the sense of Kolmogoroff, i.e. let Q 
be an arbitrary set whose elements shall be denoted by w and called elementary events, 
A a c-algebra of subsets of Q whose elements will be denoted by capital letters A, B 
etc., and called random events or simply events and P = P(A) a measure, i.e. a non- 
negative and o-additive set function defined on ,£ and normed by the condition 
P(Q) = 1; P(A) will be called the probability of the event Aeg. 

We shall denote by ¢ the empty set which represents the impossible event, 
further by A+B the union and by A.B the intersection of the sets A and B. Tf 
A,(n = 1, 2, ...) is a sequence of sets, we shall use also the notation z A, for the union 


of the sets A,. We denote by A—B the set of elements which belong to A but not to 
Band put Q—A = A, If A and Bare arbitrary events such that P(B) > 0, we shall 
denote by P(A |B) the conditional probability of the event A with respect to the condi- 
tion B, i.e. we put 


P(AB 
P(A|B) = Ey 


We shall denote by acA that a is an element of the set A and by ACB that 


the set A is a subset of the set B. 
Ag usual a real function E = E(o) defined on Q is called a random variable if 


it is measurable with respect to /£, that is, if denoting by E-Z) the set of those weQ 


for which &(o)eZ, then £-1(7) belongs to if I is an arbitrary interval of the. redi 


line. 
We denote by Z(E) the mean value (expectation) of the random variable E 


ie. we put ZË) = f dP. 
The infinite seqns of events Aj, Ap, ..., An, ..., Le. of subsets of Q belonging 
if the limit 
t i lled a stable sequence, 
o A will be ca e oly 


n+ 


(1.1) 
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exists for every Bef. We shall show that in this case Q(B) is a bounded measure on 
A, which is absolutely continuous with respect to the measure P, and thus 


Q(B) = J «dP e) 


for every Be where æ = alo) is a measurable function on Q such that 0 < (o) < 1. 
We shall call a(o) the local density of the stable sequence of events {4,}. 


As well known a(o) is not uniquely determined, but if (1.2) holds both with 
æ = ajo) and æ = a(o) then a(o) and a(o) are almost everywhere equal to another. 
In the special case when the local density is constant, i.e. a(o) = a, then 

Q(B) = «P(B) for every BeA, i.e. in this case 
lim P(A,B) = aP(B). aay (des) 

nto 

Sequences {A,} for which (1.3) holds have been considered already in a previous 
paper (Rényi, 1958) and have been called strongly mixing sequences of events 


with density æ. Thus the notion of a stable sequence of events is a generalization of 
the notion of a mixing sequence. 


The definition of a stable sequence of events can be formulated also in the following 
equivalent form: The sequence of events {A,} (n = 1, 2, ...) is called stable if for every 
event Be g such that P(B) > 0 the conditional probability P(A, |B) tends to a 
limit, ie. 

lim P(A,|B) = gB) ho” (We) 
n—>+00 
exists. Clearly, if P(B) > 0 then (1.1) and (1.4) with q(B) = Be are equivalent, while 
if P(B) = 0 then (1.1) holds with Q(B) = 0 for any sequence {A,}. 

We shall show that stable sequences of events can be simply characterized 
in terms of Hilbert space theory. Let H denote the Hilbert space of all random‘ 
variables E, defined on the probability space IQ, JA, P], for which H(£?) is finite, the 

‘inner product (E, 7) being defined by (E, 7) = H(&.7). Let æn =«,(w) denote the 
indicator of the set A,, i.e. %,(@) = 1 for oc A, and glo) = 0 for veA,. Then the 
sequence {A,} of events is stable if and only if the sequence œ, converges weakly; the 
weak limit of the sequence o being equal to the local density of the sequence {An}. 


It follows that the sequence {An} is mixing if and only if an converges weakly to a 
constant. 


We introduce further the notion of a stable sequence of random variables. 
The sequence of random variables En = Eno) (n = 1, 2, ...) will be called stable if for 
any event B with P(B) > 0 the conditional distribution of Én with respect to B tends 
to a limiting distribution, i.e. 


jim, P(En < |B) = Fyle) asi (OS) 


for every x which is a continuity point of the distribution function F p(x). 


Expressed in terms of Hilbert space theory this means that for every bounded 
and continuous function g(x) the sequence g(n) converges weakly. 
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An other equivalent definition of a stable sequence of random variables is 
the following : the sequence of random variables &,(n = 1, 2,...) is called stable, if 
for every zeX where X is a set of real numbers which is everywhere dense on the 
real line, the sequence of events £, < v (n = 1, 2,...) is stable. 

Clearly such a sequence {%,} of random variables is stable in the sense 
of (1.5), because if (1.5) holds for x belonging to an everywhere dense set X then 
it holds for every continuity point x of F',(z). On the other hand (1.5) implies the 
stability of the sequence of events &, < Yt for xe X where the set X is everywhere 
dense on the real line. 

In the special case when the limiting distribution 1’,(x) does not depend on 
the choice of B we arrive at the notion of a strongly mixing sequence of random 
variables, introduced previously (Rényi and Révész, 1958). 

The aim of the present paper is to study general properties of stable sequences 
of events and to give criteria for the stability of a sequence of events which are discussed 
in Section 2; some examples and applications of these notions in probability theory 
are discussed in Section 3. 

9. SOME GENERAL THEOREMS ON STABLE SEQUENCES OF EVENTS 

Let a, = a,(0) (n = i 2, ...) be the indicator of the set A,, i.e. ao) = 1 
if wed, and alo) = 0 if oceán. 

Let H denote the Hilbert space of all random variables č for which Z(£2) 
exists, the inner product (E, 7) being defined by (E, 7) = MË.n). We put further 


El] = (E, 622. Al definitions and theorems from Hilbert space theory which will 


be needed in the sequel can be found, e.g., in Szokefalvi-Nagy (1942). 


We prove ‘first the following theorem. 
{A,} is @ stable sequence of events, i.e. the limit 
lim P(A,,B) = Q(B) e (2) 


n—>+-00 
exists for every Be A, if and only if {en} is a weakly convergent sequence of elements of 
the Hilbert space H, i.e. if for any yeH the limit 


lim (20, 7) = A(7) TRES) 
n—>+00 


Theorem 1: 


exists, Clearly, if # is the indicator of the set B then 


(cn, B) = P(AnB)- Thus (2.2) reduces to (2.1) if we substitute B instead of 7. Thus to 
prove Theorem 1 it suffices to show that if the limit 122) exists whenever y is the 
indicator of a set B then it exists for every 7 st . Clearly, if the limit (2. 2) exists for 
every indicator 7, it exists also if y is an-arbitrary element of H which takes on only a 
finite number of values. As to every radom yeee 7 for which Zn) < +0 
and to every & > 0 one can find, hee gee i Lebesgue integral (Halmos, 
1950) a random variable %1 which aid Pë vë ha finite number of values, such 
that Zj—nml) < € it follows easily t zh ee not only for every yeH but 
also for every 7 for which (7) is finite. us Theorem 1 is proved. 
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As a consequence of Theorem 1 we obtain the following theorem. 
Theorem 2: If (An) is a stable sequence of events, i.e. if 
lim P(4,B) = Q(B) m (223) 
n—>+ 0 


exists for every Be A, then Q(B) is a measure on A which is absolutely continuous with 
respect to the measure P, and thus can be represented in the form 

Q(B) E adP vi (2.4) 

where a = a(w) is a random variable; we have further 0 < a SHË 
Proof of Theorem 2 : Clearly A(y) defined by (2.2) is a bounded linear opera- 
tion on H, and thus by a well-known theorem ISzokefalvi-Nagy (1942)] there exists 
an ae H such that A(n) can be represented in the form A(7) = (æ, 7). Tt is easy to see 
that 0 <a <1. It follows that, denoting by A the indicator of the event B, we have 
Q(B) = (c, p) mi adP. es (2.5) 


Thus Q(B) is a measure, which is absolutely continuous with respect to the measure 
P(B). We shall call « = ac) the local density of the stable sequence fA,). 
Now we shall prove a criterion of the stability of a sequence of events which 


is the generalization of a corresponding criterion for mixing sequences, proved by 
the author (Rényi, 1958). 


Theorem 3: Let {An} (n = 1, 2, ...) be a sequence of events such that the limit 


lim P(A, 4) = Q, 4. (2:0) 
n—>+00 


exists for k = 1,2,.... Then the sequence {An} is stable, i.e. (2.1) holds for every Be A. 

Proof of Theorem 3: Let Hy denote the subspace of H spanned by the 
sequence {æn} where æn» is the indicator of the event A,(n = 1, 2,...), ie. the closure 
with respect to the distance lë—nll of the set of all finite linear combinations D ca 


where c), Co, ...,€, are arbitrary real numbers. Let H 2 denote the set of those ementa 
& of H which are orthogonal to every &,¢H,. 


According to a well-known theorem (see Szokefalvi 


element E of H can be represented in the form E = E+E 
Now we shall prove that if the limit (2.6) exists for 1: — Lo 


lim (a, 3) = A(é) ci 
n+ co 


exists for every teH. To prove this it suffices to show that (2.7) 
because of the above mentioned decomposition of every čeH in 
and a EseHo: as a matter of fact if E = EçeH, then ( 
the limit (2.7) exists for E = & and £ = & 

Now if E, is a line 
the limit (2.7) exists, 


-Nagy, 1942, P. 8) each 
2 Where E,¢H, and ËseH,. 
.. then the limit 

(2.7) 


exists if £ = EjeH,, 
to the sum of a éH, 
2.7) holds with A(£,) = 0 while if 
it clearly exists for E= kt E, also. 


ar combination of a finite number of the aç's then clearly 
Let now is 


be an arbitrary clement of H,. Then to every 
e > 0 one can find a finite linear combination, X re 
í a near combination Y xx such that 
re o Kal : 


N 
SË cë 
I ET Py alee mee 


296 


———R———  ” $$ ll M 


ON STABLE SEQUENCES OF EVENTS 


But (2.8) implies in view of Jjenjj < 1 and the inequality I(£, DIS E [byl], that 


E N 
I(an, &)— 2 cn, z) | [E z (2.9) 
Thus it follows that 
| lim (an &)— lim (an, &)| < 2e. ne (2.10) 
n—> + 00 n—>+ 0 


Ase > 0is arbitrary it follows from (2.10) that the limit (2.7) exists. Thus Theorem 
3 is proved. F 

Let us put (supposing P(4;) > 0 for k = 1, 2, ..., which is no essential restric- 
tion) 

g= E e=) (2) 
P(A;) 

where Q; is defined by (2.6). 

Note that even if all the numbers qi. (¥ = 1, 2, ...) are equal, it is not sure that 
the sequence {An} is mixing. See for instance examples 1 and 2 of Section 3. This 
is true, however, in case A; = Q as it has been shown by the author (Rényi, 1958), 


Another way to express this fact is contained in Theorem 4, 


Theorem 4: Let {An} be a stable sequence of events such that lim PA DIN 
E “I too 
and lim P(AnA,) = qiP(As) (k= 1,2,...). Then the sequence {An} is mixing if and 
n—J +o 
only if the numbers qy, (k = 0, 1, ...) are all equal to another. 


Proof of Theorem 4: The necessity of the condition follows immediately 


from the definition of mixing sequences. 
The sufficiency can be proved as follows : 


the sequence {An} If qr = 4 # 0(k = 0, 1, ...) then clearly 
(a, Ar.) = (1, œz) ian (2.12) 


Let « denote the local density of 


which can be also expressed as follows : 
(a, ar) = qP(4,). Ke (2.13) 


It follows by passing to the limit from (2.12) that 


(a, æ) = q(1, a) ve (2.14) 
and from (2.13) that (a, a) = q. se ÇË) 
It follows that Sh 2 - (2.16) 
and therefore that E E) <- (2.17) 
ad [ PaP = ( f ad P). w 


This implies that x is constant almost everywhere and thus bë (2.15) 
el S (2.19) 


T = 0 is trivial. : 
pë e should like to add the following remark: 


In view of Theorem 1, Theorem 3 is a special case of the following, 


Theorem: A bounded Se tan) i (n = 1,2,...) of elements of a Hilbert 
dpe Ft AA mvergent if and only if the e (Cn, Ay) exist fork = A alia 
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The proof of this assertion is exactly the same as that of Theorem 3. This 
useful theorem, which is due to E. Schmidt (see Schmeidler, 1954) is well known 
in Hilbert space theory; e.g. the proof in Szokefalvi-Nagy, (1942, p. 10) of the theorem 
that every bounded set is weakly compact, is based essentially on this fact. 


Let us add that if {An} is a stable sequence of events and æn the indicator of An, 
then the sequence ay converges strongly in H only in the trivial case when the weak 
limit of æn is almost everywhere equal either to 1 or to 0, e.g., in the case mentioned 


in Example 1. As a matter of fact a necessary and sufficient condition of the strong 
convergence of ay to æ is lim Jjenlj = |lal|. But 
n—> +o 


lim |æn|? = lim 
N—>+ CO I+ 0 


(ën, 1) = f adP 


and |la||? =f dP. Thus % is strongly convergent to æ if and only if 


J a(1—a)dP = 0 i.e. if a(l—a) =0 almost everywhere. 


In view of Theorem 4 and of the well-known theorem of Hilbert space theory 
according to which every bounded set is weakly compact, the following theorem holds : 


Theorem 5: Any sequence {An} of events contains a subsequence whichis stable. 


An interesting feature of the stability of a sequence of events is that unlike 
such properties as independence, equivalence etc., it remains invariant when the under- 
lying measure is replaced by another which is absolutely continuous with respect to 


the original measure. Moreover the local density of a stable sequence of events remains 
also unchanged. 


Theorem 6: Let [Q, g, PJ— S be a probability space and {An} a stable 
sequence of events in $. Let P* be another probability measure on A which is absolutely 
continuous with respect to P. Then the sequence {Az} is stable on the probability space 
S* = [Q, A, P*] also, with the same local density, i.e. if (2.1) and (2.4) hold, one has also 


lim P*(A,B) = f adP*. (2.20) 
N—>+ CO B 
Proof of Theorem 6: By supposition 
P*(A) = pdP for Aeg (220) 
where p = p( 


©) is a nonnegative random variable and JpdP — 1. It follows that 
Q 


P*(A,, B) = J anpdP = (an, pp) w+ (2.22) 


where Ë is the indicator of the event Be vf. Thus the existence of the limit 
evidently from the remark made 


exists not only if neH but also 
have in this case also An 


(2.20) follows 
in course of proving Theorem 1 that the limit (2.2) 
under the single condition that E(n) exists and that we 
)S (me). Thus it follows that 


nakoo P (AnB) = (e, pp) = f abe. 


1 Prof. B, Sz. Nagy has kindly called my attention to this proof. i Të 
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This proves Theorem 6. Let us mention that for mixing sequences a more general 
result supposing only the semi-continuity of P* with respect to P has been proved by 
Sucheston (1962). 

We want to make some further general remarks. It is impossible, except 
in trivial cases that the convergence in (2.1) should be uniform in B for all Beg. As 
a matter of fact, putting B — A, one has 

P(4,,B)—Q(B) = P(An)—Q(An) = f (1—a)omd P 


Q 

and this difference tends to J a(l—a)dP. Thus the convergence in (2.1) can be uni- 
form only if the local density a is almost everywhere equal to 1 or 0 as in the trivial 
case of Example 1 in Section 3. Nevertheless the convergence in (2.1) may be uniform 
in B for Be 43 where 47 is some proper subset of Z which does not contain the sets A, 
themselves or only a finite number of them. For instance, if there exist events B 
which are independent from all the events Ag, then these may all be contained in 
45. If Bis such an event then clearly the indicator $ of Bis uncorrelated with the local 
density æ of the sequence (4,1. 


3. EXAMPLES OF STABLE SEQUENCES OF EVENTS 

Example 1: A sequence of identical events 4, = A(n = 1, 2, ...) is evidently 
stable. Note that in this case the local density a (co) is equal to 1 for weA’ and to 0 
for ceA. Let us mention that the sequence A, A,... is trivially mixing if P(A) = 0 
or P(A) = 1 but not if 0< P(A) <1. 

More generally, if 4,, is a mixing sequence of events with density a, and A 
any event, the sequence A, = A,A is a stable sequence of events, with local density 
equal to æ on A and 0 on A. 

Example 2: Let {A,} be a sequence of equivalent events (called also symme- 
trically dependent events) i.e. suppose 

P(Ai, Ais... Ai) = Wy wo (3.2) 
for any choice of the different indices i, < i, <<... < i, and for k = 1, 2,... where W, 
depends on k only, but not on the choice of the indices tyy tg, c, ip Such a sequence 
{A,} is evidently stable. As a matter of fact, it is sufficient to suppose the indepen- 
dence of W; from the indices %,, ta, ..., ip for k = 2 only. This follows clearly from 
Theorem 3. 

Thus sequences of equivalent events are always stable. Note that in view of 

Theorem 4 a sequence of equivalent events is mixing if and only if 

W, = Wi. ms (859) 
It is easy to see however that (3.2) is satisfied if and only if the Sequence (4) is a 
sequence of independent events. 

As a matter of fact according to a well-known theorem due to Khintchine 
(1952) if {A,} is a sequence of equivalent events and W, is defined by (3.1) then there 
exists a distribution function G(x) in the interval [0, 1] such that 


W, = J dote). (3.3) 
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As a matter of fact this follows from the following theorem of Hausdorff (1923). If 
a sequence {W+} is monotonic of every order, i.e. putting JA 


- Ta 
= CË ( . ) Wry > 0 
ji J 
for all nn > 0 and k > 0 then W, can be represented in the form (3.3). Now clearly 
W, = W? means by (3.3) that 
1 1 
f x?dG(x) = ( f adG(x))? 
o ò 
which implies evidently that G(x) is the distribution function of a constant c, i.e. 


b atos st 
G(x) = 
MA ai RES E 


where of course c = W, ; but then according to (3.3) W, = W* that is 
k 
P(Ai, Ais ... Air) = TL P(A; ) 
j=l 
for every sequence i, < i, < në E i and therefore the events are independent. Thus 
we have proved the following theorem. 


Theorem 7: A sequence of equivalent events is always stable, but it is mixing if 
and only if the events are completely independent. 

Example 3: Let Q betheinterval (0, 1), Æ the set of measurable subset of Q, 
and P the Lebesgue-measure. Let the set A, be defined as the union of the intervals 
(E, kx) q, 

n 


r 0, 1, ... n—1) where A(x) is a continuous function in the interval 


[0, 1] such that 0 < A(x) < 1. Then clearly the sequence {4,} is stable with local 
density A(x). 


This follows evidently as for any subinterval I of [0, 1] wo have 


BAS Ml ecg) 


(3.4) 
(kjnjer 
and the first term on the right of (3.4) is a Riemann sum of the integral f A(a)da 
Thus lim P(4,I) = DP 
dh (Al) = JAw)de. (3.5) 
It follows easily (e.g. by Theorem 3) that 
li = 
E) J Nede (3.6) 
for every measurable set B, which proves our assertion. 
Example 4 : 


Let us consider a stationary Markov chain wi i 
Dë states 1,2,...,s. Let 2), the state of the chain at time HO ee pë 
distribution over the set of states. Let E, denote the state of Be chai kë ae 
t— n(n— 1,2,...). Let A, denote the event that the valu Hee 
where E is a proper subset of the set of states. 


transition from state i into state J in k steps. 


e of E, belongs to a set E 
Let p9 denote the probability of a 

Let us suppose that the limits 

lim pf) =r; 

k>+00 i 
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exist for all i and j; 7,; may depend in general on. It follows that 
lim P(A,A,) = È W; | Sp (27) 

lim PA) = È W: (2.29 (Zan) | e Gm) 
where W; is the probability that the chain started at time t = 0 in the state i. Thus 
by Theorem 3 the sequence fA,) is stable. Note that in case 75, does not depend 
on i, the sequence {A,} is mixing. 

Example 5: Let S —IQ, 4, P] be a probability space. Suppose that 

wo . 
Q =a Q;; where Qeg and P(Q) >0 (j= 1,2, ...). Let S, —IQ,, 4, Pj] be 
the probability space obtained by putting 
Pj(A) = P(A|Q;) for Ae, GJ NANA 
where Z; denotes the set of all Ae such that AGO. 

Let {4P} be a mixing sequence of sets in the space S;, with density q,, and put 

A, = $ AD Q, Then {A,} is a stable sequence of sets in S, with local density 
j=1 9 
alo) = a; for weQ; (j = 1, 2, ...). 

Clearly, Example 5 covers all cases in which the local density æ of a stable 
sequence of sets has a discrete distribution. As a matter of fact let [Q, Æ, P] be a 
probability space and {A,} a stable sequence in this space with local density « where 
a is a discrete random variable, taking on the different values æ; (j = 1, 2, ...) with 
positive probabilities. Let Q; denote the set of those w for which a(o) = æ; Put 


P4) = P(A Jo) forj = 1, 2,.... Then clearly for any BeA. 
f -n P(A,BQ;) _ Q(BO;) 
(4, B)— li An! — NAS! — qi, PB), 
Í pami PA. B) SEA P(Q;) P(Q;) a;P(B) 
Thus the sequence {4p} is mixing in the probability space [Q, 54, Pj) with density œ; 
(j= 1,2, ...). 


Remarks: One can generalize Example 5 by splitting the probability 
space into a non-denumerable instead of a denumerable set of probability spaces, 
but in this case some care is necessary to avoid measure-theoretical difficulties. 
However, in this way we arrive at a decomposition of a stable sequence of events 
n of mixing sequences of events, in the most general case. This can be 
seen as follows. Let {4,} be a stable sequence of sets with local density a(o). Then 
one can define as usual the conditional probability of the event B with respect to a 
given value of &, which will be denoted by P,(B). P,(B) is a random variable such that 

P(AB) = P,(B)dP for every Beg 


into the unio 


and every Ae Ma where A, is the least -algebra on which œ is measurable. By other 
words P,(B) is the Radon-Nykodim derivative of the set function P(AB) with B 
S a 


fixed with respect to P(A) on the g-algebra Aa. 
As well known, while in case Be 4, B,B, = ġ for k Al the relation 


o EJ 
PË By) = 3 PB.) 

k=1 k=1 y 
holds with probability 1, i.e., except for me C where C is a set such that P(C)=0 neverthe- 
ot say that P,(B) is with probability 1 a measure, because the set C of 
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exceptional values of œ may depend on the sequence (By) and the union of all possible 
such sets C may have positive measure or even be of measure 1. As however, P,(B) 
is not uniquely determined and its value may be changed on a set of measure 0, it 
is often possible to find a determination of P,(B) such that it is with probability one 
a measure. If this is the case it is easy to see that the sequence {A,} is almost surely 
mixing with respect to the measure P,(B), with density a. 

Such examples can be constructed by means of the theory of measurable de- 
compositions of Lebesgue-spaces, developed by Rochlin (1949). We do not propose 
to go into details here, but shall return to this question in another paper. 

However, we give one example of a stable sequence of random variables cons- 
tructed by the same principle as applied in the above Example 5 of stable sequence 
of events. 

Example 6: Let {č,} be a mixing sequence of random variables with limiting 
distribution F(x) and 7 an arbitrary random variable having a discrete distribution. 
Let further g(x, v) be a continuous function of two variables. Then the sequence of 
random variables 

Ga = (Ens 1) (n=1, 2, +) 
is strictly stable. As a matter of fact if the values taken on by 7 with positive probabi- 
lity are denoted by qy, (k = 1, 2, ...) and B, denotes the event N = Yp we have 


lim PG, <2|B)= È PB |B) f dre). 
no kal 


(3.8) 
glt, Yk) < z 
We may take for instance g(u, v) = u-+v in which case we get 
lim P(G, <2|B)= 3 P(B,|B)F(z—y;) (3.9) 
n—>+ 00 ka 


respectively, we may take g(u, v) = wv, in which case, supposing that y, > 0 for 
k = 1, 2, ..., we obtain 
eo 
lim P(@, <z/B) = X P(B,/B)F (=) 
n—>+ 0 kal Yr 
for all values of z for which the function on the right hand side of (3.9) respectively 
(3.10) is continuous. 


(3.10) 
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DISCRIMINATION OF GAUSSIAN PROCESSES* 


By C. RADHAKRISHNA RAO and V. S. VARADARAJAN 
Indian Statistical Institute 


SUMMARY. Equivalence and orthogonality properties of Gaussian processes are studied in a ` 
general context and the results applied to the study of Gaussian measures on real Hilbert spaces. 
Expressions for the likelihood ratios of two equivalent Gaussian measures on a real separable Hilbert 
Space are derived and the conditions under which their logarithms are quadratic forms are precisely 
determined. 


1. CONTRIBUTIONS OF MAHALANOBIS 
We briefly refer to Mahalanobis’ pioneering work in multivariate analysis, 
specially because the problem we have considered is closely linked with a distance 
function known as Mahalanobis D? introduced by him in 1925, to study the affinities 
or divergences between populations. Since then there have been a number of appli- 
cations of Mahalanobis D? in studying the inter-relationships of groups. 


D? is a function of the characters measured on the individuals of a population 
and in practice it is important to know the behaviour of D2 as the number of characters 
tend to infinity. Since the number of characters studied will always be limited the 
classification of populations arrived at by using D2 will be stable only if D? converges 
to a stable value with increase in the number of characters. Mahalanobis (1937) 
considered this problem and stated some conditions for convergence of D” as axioms 
necessary for successful classification of populations. In a recent paper, Sneath and 
Sokal (1962) describe a similar problem and lay down hypotheses for successful classi- 
fication of Tava, which are not very different from the axioms of Mahalanobis given 
by him twenty five years ago. 

An interpretation of these axioms in terms of genetic factors affecting the observ- 
able characters was given by Rao (1954). The ideal conditions under which these 
axioms hold and the modification needed when the characters are affected by environ- 
ment were also given. It was further shown that the effect of environment is to reduce 
the distance (dissimilarity) between populations, a conclusion that must be considered 
seriously by Taxonomists. 


2. INTRODUCTION 


The simplest problem of discrimination consists in assigning an observation 
a to one of two n-variate normal populations. The discriminant function which pro- 
vides a dichotomy of the sample space for assigning observations to one or the other 
of the populations is the likelihood ratio of the densities of an n-dimensional variable 
with respect to the two populations. The discriminant function is linear in the obser- 
vations when the two populations differ only in the mean values and quadratic when 
the dispersion matrices are different. The computations involved in these cases are 
well known and straightforward. 


*This paper has been included in Contributions to Statistics, presented to Professor P. C.M ahalanobis 
on the occasion of his 70th. birthday. 
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However, there often arise in practical work situations where the observations 
on an individual are of a more general nature, such as the growth of an individual 
organism measured continuously, contour of an individual's skull, facial profile etc. 
Each such observation provides a large number of auxiliary observations like the sizes 
of an organism at various time points and lengths of various diameters of a skull 
contour, on which finite dimensional techniques could be applied. The computations, 
however, become unwieldy if the number of auxiliary observations is large. Methods 
have, therefore, to be developed by which a dichotomy of the sample space of the 
compound observations can be obtained in an elegant manner. Various questions 
arise in this connection. 


For a fixed number n of auxiliary observations any dichotomy of the sample 
space will lead to some errors of classification. The errors, however, decrease as n 
increases. A question then arises as to whether in a given situation the errors 0 
as noo, in which case perfect discrimination between populations is possible, or 
the errors stabilize at certain values as n increases. An interesting and important 
problem is to investigate the necessary and sufficient condition under which perfect 
discrimination is possible on the basis of a compound observation. The second problem 
is that of utilizing the compound observation in an effective way, when perfect diseri- 
mination does not obtain, to decide on the population from which it has arisen. Some 
answers are provided for both these problems in this paper. References to previous 
work are given at appropriate places. 


3. MATHEMATICAL PRELIMINARIES 


The main result in this section is a limit theorem, which enables us to compute, 
in the case of infinite dimensional distributions, the Hellinger distance between a pair 
of probability distributions. The limit theorem is an easy consequence of the well- 
known Martingale convergence theorem. Both the theorems of this section are known 
and have been proved by Kraft (1955). We include their statements for the sake of 
completeness. 


We denote by X an abstract space of points x and by @ a o-algebra of subsets 
of X. We shall be concerned with measures on 43. Since only finite measures are 
important for our purposes we shall use the term measure to denote only a finite 
measure. If and v are two measures we say that jx is absolutely continuous with respect 
to v, wey in symbols, if z(A) = 0 whenever v(A) = 0; e is said to be equivalent 
to hi u= v in symbols, if p&vyv and vgj. If u <& v, there isa 43-measurabl 
function f on X, 20, uniquely determined v-almost everywhere, such that Ph 
= J fdy for all Ae43. Any such f is called the derivative of M with respect to v and is 


denoted by dyjdy. p and v are said to be orthogonal, x 

set Aeg such that (A) = v(X—A)— 0. In order that Ly it is sufficient that for each 

¢ > 0 there should exist a set B, with (B,) <e and “MX—B) < e; in fact if KE 
; zis 


the set B, with e = 2-* and we write 4 = A ( 
N U As thorn ës “X—A)—O. 


mel k 
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If e is a measure on 43, f a p-integrable function on X, and g'a o-algebra 
included in 43, the conditional expectation of f given 3” under k, L,(f| 8’) in symbols, 
is any 4'-measurable function f’ such that f f’dw = J fdu for all Ag”. From this 

A A 
it follows at once that if vis a measure on 47 such that v € pe, and v', në are the respec- 
tive restrictions of v, x to @’, then dy'jdj” = E (dvjdu 18). 


Suppose now p and g are two measures on 47. Let A be any measure such 
that pKA, qKA (such measures always exist; for instance, A— pg). We 
write f = dpjdë, g = dojdë, and define 


h(p, 9) = J (fo)idA. 


The function k has many interesting properties. Before discussing some of these we 
first remark that h(p, g) is independent of the A used in defining it. In fact let us 
write ha(p, q) for the integral f(fg)!dA. Suppose 2’ is another measure with DEN 
and qgN. If we write w=A+A’, then h,(p,q) = f(dpjan. dojdy)i du = 
J (dp|da.dq/da)t dëjdy.dy = hp, q); and, by a similar reasoning, h,(p, 4) = bp, q). 

We denote by 7 the set of all probability measures on 43. If P, qE we 
shall call h(p, g) the Hellinger distance between p and q. Even though h is not a dis- 
tanco function over 39, (1—4)? is a distance function, thus providing a partial justi- 
fication for our nomenclature. The consideration of h goes back to Hellinger (1909). 
Obviously h(p, q) > 0. We write H(p, q) — —log h(p, q). 

Theorem 3.1: (i) 0 K A(p, 9) < 1 for all p, qe (ii) if &8' is a o-algebra c8 
and p',q' are the restrictions of p,q respectively to 43', then hip’, q') > Mp, q). Gii) 
(1—h)i is a distance function over P (iv) h(p, q) = 0 if and only if pla. 

For the proof the reader may refer to Kraft’s paper (1955). 


Remark: Since 0 < h(p, 9) < 1 for all p, qe”, 0 < H(p, q) < œ. H(p, q) = co 
if and only if p Lg 


In many applications X is usually a space of functions and consequently 
computation of h becomes difficult. In such situations it often happens that the 
measures on 43 are built up from measures on certain sub o-algebras of 43. The 


following theorem examines this set up. 


Theorem 3.2: Let 81C 8z- C 43) C... be an increasing sequence of sub- 
o-algebras.of 43 such that 43 is generated by U Bn Let p, q be two probability mea- 


sures on 43 and pys In their respective restrictions to d3,. Then 
MDa dy) > MPa Qa) > 
h(p, q) = lim hi i 
and (gj i Moet) 
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Proof: Let A=p+qand A, the restriction of Ato @,. Let f= dp|dà and 
g = dqjdë. We then know that dp,/dA, = Ex(f|8,) and dq,/da, = E g| B,)- 
Write fa = dp,/dA, and g, = dg,jdë,. Clearly 0 S f, 9s fn In L1 By the Martin- 
gale convergence theorem f,—> E(f|8) = f A-almost everywhere and g, >g A-almost 
everywhere. Since 0 <(f,g,)! < 1 for all n it follows that h(p,,4,)— AP: 9). The 
decreasing character of the sequence k(p,,, q,) follows from Theorem 3.1. 


4. GAUSSIAN PROCESSES 


We shall now apply the results of the preceding section to the study of a pair 
of Gaussian processes. Suppose fë. : eT} is a collection of random variables on 
X such that 47 is the smallest c-algebra of subsets of X relative to which all the č, 
are measurable. A probability measure p on @ is said to be Gaussian relative to the 
E, if for any k and a, ..., &p € T, the joint distribution of če, ..., 2x; is a normal dis- 
tribution. 


A typical and extremely useful class of examples is obtained when we take 
X to be a vector space over the reals and £, to be a collection of linear functionals 
over X closed under the formation of linear combinations. Defining 4 to be the 
smallest c-algebra of subsets of X with respect to which all the č, are measurable, 
we find that a probability measure p on @ is Gaussian if and only if each č, has a 


normal distribution under p. Two special cases are of great importance in the appli- 
cations of our theory. 


In the first example X is a real separable infinite dimensional Hilbert space 
and for each «eX, Éa is the linear functional w—(«, x) of X. Mis then the class of all 
Borel subsets of X. If p is a Gaussian measure on @ then there is an element meX 
and a linear operator A of X such that E (ča) = (m, a) and COV, (Za, Es) = (Aa, b) 
for all u, Be X. mis called the mean of p and A the dispersion operator of p. Ais 
self-adjoint, non-negative definite and has finite trace. m and A uniquely determine 
p and to each me X and self-adjoint, non-negative definite operator A with finite 
trace there corresponds a p.[cf. Prohorov (1956)]. In this case if Ay, Co... is any 
sequence in X whose linear combinations are dense in X, @ is generated by sets of the 


mi eet ; i 
form ELB) (B a Borel set on the line), and a measure on @ is Gaussi 


an if and only 
if the joint distributions of the čan are normal. 


In the second example X is the space C[a, 6] of all continuous functions on [a b] 
and 42 the class of Borel subsets of X. For each si rae 
l 7 u or each signed measure æ on fa, b] E, is the 
inear functional « v(t ; asur is Gaussian i š 
ctional naj x(t)da(t). A measure p is Gaussian if and only if for all k 


and all 4, ..., qeta, b], the joint distribution of &,,..., et, is normal, where for 
any te [a, b], & is the functional a—a(t). If pis Gaussian we can define mli) —E k , 
and K(s, t) = cov, (£,, £). me X and K is a symmetric function continuo 2 E ks 
variables. mis called the mean and K the covariance function of p. ah 
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Denote by X the Hilbert space of all functions x on la, b] with f læt)|?dt < co. 


There is an obvious inclusion X C X and it is easily seen that measures on X can be 


regarded as measures on X. Even though the topological structures of X and Y are 
quite different this difference does not play any significant role in our applications. 


For any measure p on #3 we may denote by p the corresponding measure on Ñ. The 


correspondence p zZ p preserves absolute continuity and hence equivalence and ortho- 
gonality. If pis the Gaussian measure with mean m and covariance function K, the 


measure Ð is Gaussian in the Hilbert space X. m is still the mean of p while the dis- 
"s A b 

persion operator of p is the (integral) operator v—> Ax where (Ax)(t) = J KU, u)a(u)du 
a 


for all t. A classical example is the Wiener measure with mean m and variance para- 
meter c. It is the Gaussian measure on C[0,1]. with mean m and K(s, t) = 
c min (s, t) for s, te[0, 1]; here ¢ > 0 is a constant. 

It is a consequence of the early work of Cameron and Martin (1944, 1945) 
that if one considers on the space C[0, 1] the two Wiener measures with means My, 
and m, and the same variance parameter c >0, then the two measures are either equi- 

alent or orthogonal and that equivalence is obtained only if mj—m, is sufficiently 
smooth. On the other hand, if the means are same and the variance parameters 
are different, then the measures are always orthogonal. These problems were examined 
by Segal (1958) in a very general context where he obtained necessary and sufficient 


conditions for equivalence. 

Following up these results it was proved by Feldman (1958) and Hajek (1958) 
that if two probability measures are Gaussian relative to the same set of random 
variables, then they are either equivalent or orthogonal. Necessary and anfheient 
conditions for equivalence were also obtained by these authors. 


Our concern in this section is also with a pair of probability measure Pj and p, 
which are Gaussian relative to the same set of random variables. If the two means 
are my, mg and the dispersion matrices are Ay, Ag, we obtain conditions for equivalence 
in terms A,, Ag, Mı and m, In view of the results of Feldman and Hajek it is enough 
to determine conditions for orthogonality and this is done using the Hellinger distance, 


We now introduce the notations which we shall adhere to throughout the rest 
of this section. X is an abstract set, 4 a o-algebra of subsets of X and És Gay. wan TRL 
sequence of random variables on X such that @ is the smallest g-algebra of subsets 
of X-relative to which all the &, are measurable. We denote by @,, the smallest 
c-algebra relative to which Gresa E are measurable. A probability measure on 9 
is called Gaussian if it is Gaussian relative to the (E, : n = 1, 2, mj If p is any 


Gaussian measure we can form 
mp) = FE) 


Aj(P) =[cov,( E Ex) 
307 


SANKHYA : THE INDIAN JOURNAL OF STATISTICS : SERIES A 


We denote by m(p) the sequence (m,(p), ma(p), ...), called the mean of p, and by A(p) 
the infinite matrix (A;,(p)). For any integer n > 1 we write A,(p) for the nxn 
matrix (Aj,(p)), Path Pac Clearly A,(p) is the dispersion matrix of (%4, ..., En) under 
p. A(p) is called the dispersion matrix of p. We shall say that p is non-singular 
if |A,(p)| #0 for any n (for any matrix C we write |C] for its determinant). 


A typical example of our considerations is obtained when X. is the space of 
all sequences x = (ty, Tə, ...) of real numbers and for any n, E (£) = v,. æ is the 
smallest g-algebra of subsets of X relative to which all the £,, are measurable. If 
m = (M4, Mo, ...) is any sequence of real numbers and A = (Aj) any infinite matrix 
such that A, is positive definite for each n, there is a unique Gaussian measure on @ 


with mean m and dispersion matrix A. We shall denote this measure by p(m, A). 
.p(m, A) is clearly non-singular. 


Suppose now that p, and p, are two non-singular Gaussian measures on 43. 
We write m; = m(p;) = (mp, Mo, ...) and A, = A(p;) (j = 1,2). We define 


dp = Myn— Myn n= 1; 2, 
ô = (dy, do, ce) 

and ôn = (dy, da -> da). 

We next introduce the matrices . 
A = $ (A+ 42) 


A, = E (An Agn) 
We introduce the quantities. 
Pn = 2 log |A,,| —log|A,,| —log| Ag, | 

D? = 6,A5164, (for any matrix O, Ct denotes the transpose of 0), 
Dj is the well-known Mahalanobis D? between two distributions in n-space having mean 
difference ô, and the same dispersion matrix A,. Finally we write Pin and py, for 
the restrictions of p, and pa to @,. We recall that An is the smallest o-algebra with 
respect to which &,...,&, are measurable. 


Theorem 4.1: For each n>-1, we have 


1 
H(Pins Pan) = Ei Pu DË. 


The sequences {p,} and {D3} are both non-negative and mono 


a 3 ; tonic non-decreasing ; and 
if we write Pea = lim p, and Dë, = lim D2, then 
n—00 n—>00 


il 
Mos Po) ST Peta DË. 


In order that p, = py it is necessary 
the sequences {p} and {D3} be bounded. 


only if (i) the Gaussian measures p(O, Ay) 
measures pt 


and sufficient that Pa < © and D? 


I œ < %0 1.6, 
nother words p, and Pa are equivalent if and 
and p(0, A. 


2) are equivalent and (ii) the Gaussian 
m, A) and p(me, A) are equivalent, i 
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Proof: If pi, and pë,, are the distributions (in n-space) of (E), ..., £,) under 
Pı and p, respectively, it is easy to see that X(Dins Pan) = h(i ns Pin). Using the 
formulae for the densities of p! „ and Pn and applying standard techniques of computing 
multiple integrals we obtain the formula for H(Pin: Pan). In view of Theorem 
3.1 we know that H(p,,, Pon) > 0 and increases with n. 


Substituting in the formula for H(p’;,, P'on) the values My = My, = 0 for 
i = 1, 2,..., we see that a, is the Hellinger distance between the n-dimensional 
distributions of p(0, Ay) and p(0, A,). Part (ii) of Theorem 3.1 now enables us to con- 
clude that p,, > 0 and increases with n. Similarly =D is the Hellinger distance 
between the n-dimensional distributions of Pm, A) and plane, A). Consequently 
D2 > 0 and increases with n. 

Theorem 3.2 now applies to give the result that Pi | pa if and only if 


H(Pin Po,)—900 i.e. if and only if either p = œ or DË, =œ. Since either Pı l| Po or 
Pa = po we may conclude that p, = ps if and only if pp < co and DE Zo 


1 
Finally using the same arguments we conclude that— Po = H(p(0, Ay), p(0, As) 


and ; D2, = H(p(m,, A), pra, A)). This implies immediately that p, = Pa if and 
only if p(0, Ay) = p(0, Agjand p(m,, A) = p(ma, A). This completes the proof of the 
theorem. 
Corollary: If Aj = Ag, then pı = py or pı | po according cenen d,Api dë 
< 00 or = CO. 
Remark: Kraft (1955) also computes H(p,,, Pan). The formula which he 
derives is not however in the same form as ours. 
We shall now examine more closely the conditions under which p,, < œ. 
Since A,,, and Ag, are, for each n, positive definite matrices, there exists a non-singular 
MA , 
matrix S,, such that 
SpA Sa = In 
8 nAn Sh = L,, 


where Z, is the nxn unit matrix and L, is a diagonal matrix. Let 4), Nan Së 
be the pe of L,. It is known that they are the roots of the equation (in A) 
|Acn—AAan| = 0 


and that they can also be determined as the eigen-values of the positive-definite 
matrix Artë, Ap’ In particular A,; > 0 for all n and i. 
. in 
Theorem 4.2: With the above notation, 


pa = È (2 log (1+Ayg)/2—log Api). 
f=1 
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In order that p,, < © it is necessary and sufficient that there should be positive constants 
c, C, M such that 


(a) OZEK AG KCZo00 for all n and i 
b) TAP MO for all n. 
i=1 
Equivalently, pa < 90 if and only if there exists a positive constant M' such that 


3 (Ani— 1)? Ani < M' < 00 for all n. 
il 
Proof: From the formula 
Pn = 2 log |A,,| —log] A,,,| —log|A,,| 


it is clear that p, remains unchanged if we replace A,, and Aon by SAJ SË, and 


in“n 


Sn As, St, respectively. The expression for p, in terms of the A,; follows at once. 


Suppose now there exists a constant M’ > 0 such that = An—-1lP An < M' 
= 


<oforalln. Rewriting 2 log (1 +u)/2—log was log (1-+u)?/4u and using the inequality 
log (1+v) < v for v > 0 we get 


te è log [14 (Ap: —1)?/4A,:] 


< È (Ani —1)?/4Ani 
isl 


i= 


lap 
SJY. 
Thus p,, < co. 


Conversely, let us suppose that Po <%. This means that p, KM” for all 
n, M" > 0 being a constant. Since (1--u)? > 4u, we have, 2 log (1+-A,,)/2—log A,,;>0 
for all n and i and hence 2 log (1-+-A,,)/2—log Ant < M” for all n and i. ` Since 
2 log (1-++-w)/2—log u— co as u—> Oand as uco it foll 


ows that there are constants e >0 
and C > 0 such that 0 < c < Ani KO < 00 for all n and i. But then 


log A,,—2 log (1-++A,4)/2 = log AA nil HÀ ni)? 
= log (1—(A,,— 1) (A, 4 1)2) 


S TA PA) (since log (1—t) < 


< -p Cn 


—tfor0 Si 1) 


so that E 


(C-+1)2 Ë UL 


Pi > 
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2 
Consequently, if M' = M” ay 3 


È Ari DYJA: < M Eco 
i=1 
for all n. Thus the finiteness of Pe is equivalent to the boundedness of D (Api— DEJA. 


Now if 0 ge KA, S C < œ for all n and i and if Z(Apj—1) < M < o for 
all n, we see that E (Aj—1)2JA,ç < Mje < 00 so that we may conaluas that pæ < co. 
On the other hand, if Po < 90, the analysis in the preceding paragraph shows that there 
constants c, C > 0 such that 0<¢ <A,;< C < œ forall n andi and that E (Apj—I1)2 
remains bounded as nx increases. The proof of the theorem is completed. 

It follows from Theorem 4.1, that the Gaussian measures p, and p, are 


orthogonal whenever Dë, = co. It is interesting to remark that this result remains 
true even if p, and py are not Gaussian. In fact we have 


Theorem 4.3: Let p, and pa be two probability measures on Æ and let Ej, Ea... 
be a sequence of random variables having finite second moments under both p, and py. 


Let 8 
Mn = Ly(En) (7 = 1, 2) 


d, = Mn Man, n = (d,, Kë dp) 
and let Ay and Ag be the dispersion matrices and A=4(A,+<A,). If D2 = ôn Az} di—co, 
then p, | Po. 

Proof: Clearly it is enough to prove that for each e > 0 there is a set ‘Bs 
with p,(B,)+-poX—B,) Ze. We shall construct, for each n, a set B, such that 
p(B,,)+p(X—B,,) < 8/D2. Since D200, this will show that P| De 

Consider now, for a given n, &,...,&,. Let Sbea non-singular matrix with 
entries s; such that S Aqn S' = I and S Ay, S'— L where I is the nxn unit matrix and 
La diagonal matrix with entries Aj,...,A,. The A’s are all > 0. Define Q =X Sie Eye 

k 


Then 94, ...,7 ave uncorrelated under both p, and pẹ. Write now 
aj = Epi (m), b; = Ep: (74). 6 = a—b, 
We have the equation ô, = c.(S—)! so that 
D2 = 6,A51 of 
= (SA, §-1, ct 
c.(SA,,S*)-1, ct 
e.(I+L/2)-. cë 


n 


=2 Zi GI(1--4,). 
g 


and define c = (c1, -»-s Cn). 


Il 
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i à b 
Let rj = ¢/(1+A;) and write a = E rja; and b = rb; We may assume @ < 


3 
(since the case where a > b can be reduced to this by interchanging p, and 79). Let 
B,, be the set defined by the inequality 


y+. rn > (a+6)/2. 


Since this is equivalent to r(7,—a@,)+...+7,(—a,) > (b—a)/2, and since varp,(7;) = 1 
for all j, we have, by Chebyshev’s inequality, 


Pi(B,) < 4 = 3[(b—a). 
Using a similar argument and remembering that varp, (75) = A, for all j, we get 
p{X—B,) <4 a 15A4[(b—a)?. 
Since b—a = Zjez(1-FA,), we obtain 
p(B,)+p(X—B,) < AZSHINE NAA = 8/3. 


As observed earlier, this proves that p, | p, and finishes the proof of the theorem. 


5. GAUSSIAN MEASURES IN HILBERT SPACE 


We shall examine in this section questions relating to Gaussian measures in 
a real, separable infinite dimensional Hilbert space X. If A is the dispersion 
operator of a Gaussian measure with mean m, it is clear that for any we X with 
(Au, u) = (Atu, Atu) = 0 the entire mass of the measure is concentrated on the set 
{a : (x, u) = (m, u)}. Since only elementary arguments are needed to take care of such 
situations, we shall systematically consider only those A for which Atu does not vanish 
unless v = 0. Such A we shall call non-singular. Since A is non-negative definite, 
Alu = 0 if and only if 4u=0. We introduce the partial ordering < in the 
set of all self-adjoint operators in X; A SB if B—A is non-negative definite 


(denoted by B—A > 0). For any operator A we write R(A) for the range of A i.e. 
R(A) = {Au : we X}. 


Let now p, and p, be Gaussian measures on X with means m, and m and disper- 
sion operators A, and A,, which we shall assume to be non-singular. We write 
A= }(A,+A,). It is easily seen that A is also non-singular. 


Let us write 6 = m,—ma. 
We now have the following theorem. 


Theorem 5.1: In order that Pı = Po 
the following conditions be satisfied : 


(a) de KA): 


it is necessary and sufficient that both 


(b) there exists a bounded self-adjoint 


“Operator T such that (i) T > k.I for 
some constant k > 0 (ii) tr (T—I)} < œ (iii) 


Ag = AMAR, 
Proof: We know from Theorem 4.1 that i 


fe r ) n order that the equivalence 
Pı = Pa hold it is necessary and sufficient that (1) 


the Gaussian measures with means 
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my and m and dispersion operator A are equivalent and (II) the Gaussian measures 
with means 0 and dispersion operators A, and A, are equivalent. We shall prove 
now that (a) is necessary and sufficient for (I) and (b) is necessary and sufficient for 
(II). 
Let P,, be the projection on the spectral subspace X, of A corresponding to the 
interval E š co) on the real line. Let p; and ps be the Gaussian measures with means 
n 


m, and Mm, and dispersion operator A and let Din and Pa, be their projections on X,,. 
These are equivalent since X, is finite dimensional and is non-singular. Since they 
have the same dispersion operator A, = P,,AP,, we have, after a direct computation, 


' l1 EUNE 
H(Pin Pin) — SA” Ons 9) ka gllAn #6,,|? 


where ô, = P,6. In other words, in order that the first equivalence holds it is necesary 
and sufficient that || Az#0, ||? stay bounded as n increases. Let «, eX, be such that 
Ala, — 6,. Since X, is a spectral subspace of A, P,, commutes with A? and hence 
Ata, =ô, We shall show that de R(A$) if and only if ||æ,|| remains bounded as n 
increases. For, suppose ||%,|| < C < œ for all n. Then there exists an xe X anda 
sequence nı < Ng <... such that (dng, u)>(x, U) for all weX. Since A? is symmetric, 
(AtA, u) = (8, Atu) so that (Atang, u)—(A' a, u) for all we X. But Aton, = ông and 
ôn >ô and k—co. Thus Ala = 6 proving that xe R(Ai). On the other hand, if 
there is an xe X such that Ata = ô, then the fact that P, commutes with At implies 
that AtP,« = 6, and hence that a, = P,a. Thus «,— a and hence ||, || remains 


bounded as increases. 

We now take up the second equivalence and show that for this to hold (b) 
is necessary and sufficient. Let us denote the Gaussian measures with means 0, and 
dispersion operators A, and Ag, by q, and ga. Since Ay ł has a dense domain we can 
find an orthonormal basis €z, €g, --- for X such that e, lies in the domain of Ay? for 
all n. Let &, be the linear functional x(x, Az? e,). Write 


tan = (AgAz# Ems Aşi en) 


tan = tan —Smn (Kronecker delta) 


and 7, be the matrix (fj) <si<n- Moreover we have 


COVDI (Ea, En) = Can 


COVp2 QË En) = tmr 


Now the linear combinations of the vectors Ayte, are dense in X and hence 
we may conclude that the class of Borel sets in X are generated by sets of the form 


E-YY) (Y a Borel set on tho line). Theorems 4.1 and 4.2 now apply and enable us 


to conclude that the measures qy and q are equivalent if and only if (1) there exist 
1 r . 

constants & and K such that 0 < k < Ani E K for all n and i and (2) there exists a 
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constant K’ such that X (A,;—1)* < K' < œ for all n; here Ajj, ..., Afin are the eigen- 
t 


values of T„. These conditions are obviously equivalent to (1) k-I,< DT EK, 
for all n and (2) tr (T',—I,)8 < K' < œ for all n, I,, being the nxn unit matrix. We 
shall complete the proof by showing that these are equivalent to (b). Suppose there 
exists a T satisfying (b). We have, (Te; e) = (Ay? T ATÈ Az? e Az#e;) = (Ay ATÈ êi 
Az? e) = ty. Hence (T'e,, e) = (T'e,, T'e) = X t where T” —T—l. Since tr(T") 
3 
< © I (T'te,, es) < co and hence X E ti? < co. This shows that tr (7,—I,)? = EX tj 
i se i jan 
` remains bounded as n increases, proving (2’). On the other hand, T > k.I so that 
(Tu, u) > k(u, u) for all u. If we consider this for all u in the subspace spanned by 
€4; «++, @, We can deduce that T, > k-I,. Since T is bounded, T < K-I for some K 
and by an argument similar to the one just given we conclude that T, < K-I,,. This 
proves (1’). Suppose conversely that (1’) and (2’) are satisfied. Since tr(Z',,—I,,)? 


= Za të, (2) implies that X E t? <c. Hence there exists a bounded operator 
ijen Ey 


T' that (T'es e;) = t; and ž (T'e; T'e) = > z i? < co. Evidently 7” is self-adjoint 
and hence 772 > 0 showing that tr T"? < Co. Write T = T'+I. Then (Te,, e) = ty. 
Since kI, K KI, for all n, k(u, u) < (Tu, u) < K-(u, u) for all u which atë ites 
linear combinations of the es. This proves that kI<fT < KJ. Finally 
(A TAH Art es, Az? e) = (Te, eg) = të = (AAF? e; AT? 6), so that if œ and y are any 
two elements in the linear manifold X’ generated by the Art e, (AMT AY x, y) = (Ag 2,9) 
Since X’ is dense in X, we can conclude that A, = ATA}. This proves el tb) 
is implied by (1’) and (2’), As observed earlier this completes the proof of the fact 
that the Gaussian measures with means 0 and dispersion operators A, and A 
valent if and only if (b) is satisfied. Taken with the earlier proof that (2) is necessar 

and sufficient for the equivalence of the Gaussian measures with means m mä : 
and dispersion operator A, this proves the entire theorem. [ne 


2 are equi- 


Remarks: (1) It may be noticed that the condition (b) is not z 
in A, and A, However this is.only an apparent difficulty. In fact on pi gee om 
that if (b) is satisfied, there exists an operator S, bounded and self-adjoi oo 
(i)S >k'-I for some constant k' > 0 (ii) tr(S—l) <a (iii) Aig 5 Sio that 
Q: = A}T* is a bounded operator and QQ* = A,, so that using is, slk Frin lasa 
of Q and the fact that A} has dense range, we deduce that de M Pës eger 

a ero 1s 


unitary. Thus A} = A}UT— and consequently we have A, = AUTAU-IA U 
: — So a . 
is unitary and kI < T < KI S= UTU is a ->s self. a Å sip 

: R int operator 


mre 1 E 

with SY x: I. It is easy to conclude from 7 > k.I and tT < hat 
s = co tha 

tr(T-1— I)? < œ so that tr(S—I)? < co. Obviously A} SAI =A 
2 = Aj. 
(2) It might be noticed that throu 
f ghout the proof 8 
E k has been made of the fact that A, and A, its on re gly a oad 
= acci pë The point is that if A, and A, do not have finite air a oe ae 
are no \aussian measures with these as dispersion Operators ss "i oe =n 
5 evertheless exist 
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Gaussian weak distributions (in the sense of Segal (1958)) with dispersion operators 
A, and A, and the conditions (a) and (b) of Theorem 5.1 are necessary and sufficient 
for the equivalence of the Gaussian weak distributions with means m, and m, and dis- 
persion operators A, and Aj. The proof of this needs only trivial modifications in the 
proof of Theorem 5.1. In fact our proof that (b) is necessary and sufficient for the 
equivalence of the Gaussian distributions with means 0 and dispersion operators 
A, and A, does not use the existence of tr(A,) and tr(A,) and hence goes over without 
changes. On the other hand, in the proof that (a) is necessary and sufficient for the 
equivalence of the Gaussian distributions with means m, and m, and dispersion opera- 
tor A, the only place where compactness of A is used is in deducing that the spectral 


1 2 ; 5 5 Te 
subspace X „of A corresponding tof 4, oo] is finite dimensional which in turn leads to 


, , , : il 
the finiteness of H(p;,,2,) and to the formula H(Pin: P2)) = Sila gjë. If A 
is not compact we may reach the same results by arguing differently. Notice that 


over X,, A, > 2. I so that A, has a bounded inverse thereon. It can be easily shown 


4 Le yx 
in this case that H(Pin Pon) is finite and equals TË |A;*6,|?. The rest of the proof 


needs no change. 
(3) It can be easily verified that condition (b) is equivalent to the condi- 


tions of Feldman (1958) in the special context of Theorem 5.1 when m, and m, are 0. 
Feldman’s methods are however somewhat different from ours. 

(4) If A, =cA, where c— 0 is a constant, then it follows easily from 
Theorem 5.1 that the Gaussian weak distributions (means arbitrary) with dispersion 
operators A, and A, are orthogonal whenever c 41. This generalizes the classical 
result concerning Wiener measures. 

(5) If A is the integral operator at) TKG, u) a(ujdu where K (s, t) =" 


min (s, t) 0 < s, t K 1, then it is a routine computation to obtain the eigen-values 
and eigen-functions of A. It can then be proved easily that an xeC[0, 1] lies in 
R(A+)(A is considered in Z,(0, 1)) if and only if x is absolutely continuous and 


T | x'(t) |2dt < 0, 2" being the derivative of x. In other words, the Wiener measures with 
0 


the same variance parameter and difference 6 in means are equivalent if and only if 6 


is absolutely continuous and d 'e La(0, 1). 
6. LIKELIHOOD RATIOS 


In the preceding sections we have been examining the conditions under which 
two given measures are equivalent or orthogonal. From the point of view of diseri- 
mination the case when equivalence obtains is the nontrivial one and there arises 
the important question of deriving the expressions for the discriminant functions, 
Mathematically, if pı and p, are the Gaussian measures, the problem is that of com- 
puting the logarithm L of the likelihood ratio dp,/dp, as a function on the sample space, 


315 


SANKHYA : THE INDIAN JOURNAL OF STATISTICS : SERIES A 


We shall study this question in this section. We shall restrict ourselves to a concrete 
situation, namely when the measures are defined on a real separable infinite dimensional 
Hilbert space X. Some such restriction seems necessary if one’s aim is to exhibit an 
explicit formula for L. It would be of interest to obtain the form of Lin other 
concrete situations. 


In the case when XY is finite dimensional, as soon as the dispersion matrices 
are nonsingular, the Gaussian measures in question are equivalent and it is easy to 
write down L. Except for an additive constant, L is a linear function if the disper- 
sion matrices are identical while the general case leads to an L which is quadratic in 
the observations. The situation is somewhat more involved in the infinite dimensional 
case. The reason for this is that in an infinite dimensional vector space linear and 
quadratic functions are not well behaved unless one imposes certain topological 
restrictions on them. Thus, even though it is true that L can be regarded as a qua- 
dratic function nearly always, it will not in general be the quadratic form associated 
with a reasonably well-behaved linear transformation. Theorems 6.1, 6.2 and 6.3, 
which describe the precise conditions under which L can be associated with well-behaved 


analytic objects in the underlying Hilbert space and exhibit the expressions for L 
under these circumstances, are the main results of this section. 


l Tt is a general feature of all the known results on likelihood ratios that the compu- 
tations are made on finite dimensional distributions and the final results obtained by 
a passage to the limit. Such a method runs into difficulties when the Gaussian measures 
on the Hilbert space have different dispersion operators mainly because the inverses 
of the dispersion operators are unbounded in the Hilbert space. So far as we are 
aware there have been no formulae of significant generality for these likelihood ratios 
We shall, in this section, give one such general formula. It is of interest to point out 


that we do not rely upon the method of finite dimensional approximation but use a 
more direct argument. 


Throughout this section X denotes a real separable infinite dimensional Hilbert 
By a dispersion operator in X we mean a linear operator A > 0 with finite 
trace; pı and p, denote Gaussian measures with means m4, ma and dispersion operators 


A, and A, respectively. We write, as usual, m = 4(m,+m,) and A = 4(A,+A,). 
We shall always assume that A, and A, are nonsingular as in Section 5 ts 


space. 


We shall first examine the case when A, = A, = A. The results are known 
(Grenander, 1952) but we give a discussion of this case since it serves to introduce 
our point of view and prepare the ground for the more difficult case when A, së A 

1 2 


4 i y 
Suppose then p, and p, are Gaussian measures with means m,, m, and dispersion 


operator A. We assume that p, = 79. In view of our results in Section 5 this means 


that ô = mj—myeR(At) and hence that || A~} dj < œ. If we write L = log (dp,jdpo) 

ae 1 , + f r $ 3 
sae L(x) = L'(a—my) Witte L ki log (dp,dpa), pı and pë being Gaussian measures 
with means ô and 0 and dispersion operator A. To compute L’ we may proceed as 
follows. Since A is a nonsingular dispersion operator we can find an orthonormal 
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basis e,, e>, ... for X such that for each n Ae, = Apen: Àn > 0. Write &(x) = Ap t(x, e). 

Then £,,%,... are independent under both p; and p, and we have 

L(x) = lim L(x) for almost all x where L, is the logarithm of the likelihood ratio of 
n— oo 

the restrictions of p; and pg to the smallest o-algebra with respect to which č}, ..., E,, 

are measurable. It is easy to check that 


Dia) SH Ap (ATE 6, e)(x, ae ed (A-*6, eg)”. 


The second term converges to ||A-? d||?. The first term consists of independent sum- 
mands with means 0 and variances (A~*d, e)”, (k = 1, 2,...) under p',, and since 
= (A-46, ep)? < œ the first term converges almost surely. Hence we have the expres- 
k 


s 1 5 i : T E 
sion L'(x) = x (A740, e,) Ex(x =z lAo] for L'. Since each & is a linear function 


it is reasonable to regard L’ as a linear function on X. However L’ is defined for almost 
all x and in general cannot be extended to X as a continuous linear function. We 
shall say that L(= log dp,jdpe) is linear if there exists a continuous linear functional 
y on Y and a constant c such that L(x) = y(x)+c for almost all v. 

Theorem 6.1: Let p, and pa be equivalent Gaussian measures with means 
m, and ma and dispersion operator A. In order that L = log dp,|dp, be linear it is 
necessary and sufficient that deR(A) where $ = m,—mMy. In that case, 


Lx) = (xe, A*8)—(mg, AM) —L06, A438) 


for almost all x. 
Proof: Let us first assume that de R(A). Then 


(z, A18) = x (Atë, ex)(@, ex) 


= z (ô, A! e)(%, ez) = zi Ag (6, &)(%, ez) 


and consequently L’(«) = (x, A=) A= dje. TË we recall that L(x) = L’(~—me) 
and note that ||A-} dje — (6, A78) we obtain the expression L(x) = (x, A~! 0) 


—(m, A 1 8) — 5 (8, A7 8). 


Conversely, let us assume that there exists a continuous linear function y 
and a constant c such that L(x) = y(x)+c for almost all x. Then L’(x) = L(x-+mg) 
=y(x)--e’ where c' is another constant. There existsye X such that L'(x) = (a, y)+-e'. 
Now for any u in X, the joint probability distribution v under py of č and 7 where 
E(x) = (a, u) and q(x) = (x, y) is Gaussian with means zero and dispersion matrix 


Fe (Au,u) (Au, y) 
(Awy) (Ay) ) 


J exp Le, y) +i, u)]dpole) = f exp (Hi E)aMtE, 7) 
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we conclude on the basis of an easy computation that the integral on the left side exists 


1 i l 
and is equal to exp = (Au, “tt Y: y)+i(Ay, u)]. Putting u = 0 we see that 


J exp ((e, lëzite) = exp Garo): 


since L’ = log dp;/dp;, we have 
J exp [L'(2)]dpi(x) = 1 


so that L(x) = (a, n- (Ay, y). The equation 
/ 


J exp Kte, y)—HA y, y)-Hite, w)ldpia) = exp [ -5A u, a)-HiA.y, u) | 


now tells us that p; is a Gaussian measure with mean Ay and dispersion operator A 
and consequently that ô = Ay. This proves that deR(A). The proof of the theorem 
is now complete. 

We now proceed to the general case when A, E Ag. In discussing this case 
we shall first assume that m, = m,=0. Let p, and p, be Gaussian measures with 
means 0 and dispersion operators A, and A, respectively. In view of our results of 
Section 5 there exists a bounded self-adjoint operator S with S > kI for some constant 
k > 0 such that A, = A} S A} and tr(S—I)? < œ. Let €y, €,... be an orthonormal 
basis for X such that Age, = Ae, for all n, Àn > 0 being a constant. Write fn=Aztens 
Ene) = (x,f,). Then 1, Ge, ... are independent with means 0 and variances 1 under 
Pa While they have means 0 and covariances s4 = (Sep, e) under Pj. Since Ş has a 
pure point spectrum there is an orthonormal basis 91; Jz.. of X and constants 
Sr 82, ... > O such that Sg, = s,g, for alln. Write 9; = z Area and y(x) = E aE, (x). 
ee (djr)? < 00, nj is finite with probability one under Pa and hence nder Pı 
also. Moreover the orthonormality of g}, I2 --- implies that 7,, VËRË 
dent with means 0 and variances 1 under Po 
Së Air Uy Sy = z Arr Qu (Sey, e) = (SIr 9i) = dha Sk 
Tas Nay «+ are independent with means 0 and varia 
thus conclude that L(x) = lim L(x) 


are indepen- 
On the other hand COVp, (N M) 


(Kronecker delta). Consequently 


Nees $1, Sg... Under p. We may 
for almost all x where 


Les EL 5-1) De log s, k 


k=l 


4s 1 a 
Since tr(S—1)2 < co, z i ) < œ and hence log s, converges or diverges 


A 1 
th ——1)}. 'S—T i : 1 
with > ( = 1 ) Assume now that S—l is of trace class i.e. zj | <œ. Then 
1 r ag 
2 Pë —l | *[m(2)? converges with probability one and hence we have 


Le) =—+ zh 


1 1 
SOT neri zg, 
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If Y is the set of all x such that 7,(x) is finite for all k and X |-—1 | I. (x)? < ©, it 
E 185 


is clear that Y is a linear manifold of probability one and Z is, omitting an additive 
constant, expressed as an absolutely convergent series of summands each of which is 
a constant multiple of the square of a linear function. In this sense it is possible 
to regard L as a quadratic function. Notice that the condition that (S—Z) is of trace 
class, which was imposed during the course of the above discussion is stronger than 
the condition tr(S—J)® < œ which is necessary and sufficient for equivalence of p, 
and pa. 

This quadratic function is not in general the quadratic form associated with 
any linear transformation of X. When X is finite-dimensional, Lis, upto an additive 
constant, the quadratic form associated with the matrix Ay!—A51. In the infinite 
dimensional case Aj' and Az? are unbounded operators and consequently Az1—A 1 
need not be even defined for a large class of vectors. It is therefore not surprising that 
one is not always able to exhibit L as a quadratic from. Given a function fon X we 
shall say that f is a quadratic form if there exists a closed, densely defined, symmetric 
operator A (ie, (Av, y) = (e, Ay) for all a, yeD(A))} such that (i) pe(D(4)) = 1, 
(ii) f(a) = (Ax, x)--c for some constant c and almost all x. The question that we 
want to examine now is this: when is La quadratic form in the sense of this definition 2 


We proceed to prove a series of lemmas. pj and pa are equivalent Gaussian 
measures with means 0 and dispersion operators A; and As: e, €9,...is an orthonormal 
basis for X such that Ase, = Ae, for all n, A, > 0 a constant. Since tr(A,) < œ, 
for any projection P in X, tr (Aj PA$) < œ. We shall denote it by 7(P). It is easy 
to show that z(P) = tr (AP). m is countably additive over mutually orthogonal 
projections. 

Lemma 1: The following statements on a closed densely defined operator A 
are equivalent. 

(i) pa(D(A)) = 1. 
(ii) AAje, is defined for all n and z | A Age, jj < co. 


(ii) M= AAS is defined everywhere, is a bounded operator and tr(M*M es A 
If A isin addition self-adjoint then these are also equivalent to 


(iv) J dat) SO 


where a is the measure on the real line defined by a(E) = n(Pr), P(E—> Px) b eing the 


spectral measure of A. 

Proof: If M is a bounded operator then tr(M*M) < œ if and only if 
E (M*M eg, Cn) = X || Are, ||? <% and consequently (iii) (ii). Conversely let us 
n n 


SË 
t D(A) is the domain of definition of A. 
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assume that SË is valid. If x is a finite linear combination, say 7,¢,+...+-a,¢, of norm 


T 
1, so that X 2 |as]? = 1, then M is defined for x and || Mæ |? < ( = |a;| || We: l)? < 
i 


£ M ele: Sial 2 < C where C = $ ||Me,l|?. Therefore || Mel) < C ||x|| for all finite 
1 1 1 


linear combinations x of the eçs. Since A is closed, it follows easily from this that 
M = AA is defined everywhere and || Mxji < C ||a|| for all a. Since $ a || Me, |? E co 


(iii) follows immediately. This proves that (ii)— (iii). 


If A is closed and densely defined there exists a self-adjoint H such that 
D(H) = D(A) and Hull = ||Aw|| for all weD(A); in fact we may take H = (AA): 
(Riesz-Nagy (1952)). This means that for the equivalences (i) <— (iii) ~— (iv) it 
may be assumed that A is self-adjoint. We shall do so. 

Suppose now (i) is true. We shall prove that (i)— (iv). For any weX let v, 
be the measure H->||P;2\|? on the line, P being the spectral measure of A. We know 

co 

that D(A)— fa: VË Pdyjt) < co}. Write D; = {t :j <|t|<j+} j=0,1,2,... 
xeD(4) if and any, if E i, Pdyj(t) < œ. Since p,(D(A)) = 1 and i dy (t) > 


Then 
2v, (Dj) 


it follows Hehe yas se for almost all v. Nowlet hj, (4=1, 2, The an orthonormal 


basis for the fants of Pp; and let v;,(%) = (Pp; x, hju). Then Pall 05, (x))2—= Zjy (D) <% 
T 


with probablity one. Since the joint distributions of flat Vj, are all Gansin and 


since Hp,(v;,) = 0 we easily conclude that X Ep, vë, < co i.e that z jë Bp (£ 
2 VEJ 


where Ep, denotes expectation with respect to p, and P(x) = v Dp. Since Ep,(ġ;) 
id PAPj 


= n(Pp;) = tr(PD;A) = a(D,), we get z j’a(D;) < œ. This proves that i dalt) < co 


Conversely from the finiteness of ii Pdalt) we may conclude that 3 (j+1)? a(D;) 
s 2 (+1) Ep,(¢;) < 00 from iki the convergence for almost all a of a { dv,(t) 


Dj 


Sue tdy,(t) follows at once. This proves that (iv) (3) and completes the proof that 


Gens (iv). 
We now proceed to the equivalence (iv) +5 (ii). If Jel=1 
vj, (SIIP pA el” (APP eAde, z) < tr (AFPA) =n(Pp)=a(E) T s P 
= . Thus, if 2da 
KE | i£ f datë) < c0 
on ft vylt) E 00 for y = Ale, æ being arbitrary and of norm 1. 'This proves that AA} 


is defined everywhere. Moreover, for an = 5 

n Y n,a, (E) = y S 
Ea n (E) 2 (E) = al (A}Pg Ade;,¢;) 
A$P,A}) = a(E a 
K E (E) so that = |Adge, |e = A Pda (t) < E dal 
Finally let us assume that (iii) is valid. Since a,(7)1 a(E) for all 
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eo wo 
Borel sets Æ and ie #da,(t) < 2 || AAge, |? < © for all n, it easily follows that 
a 
Sed a(t) < œ. This prove that (iii)e—— (iv) and finishes the proof of the lemma. 


Lemma 2: Let A be a closed, densely defined operator with p{D(A)) = 1 
and let M = AA}. Then Ai Mis a bounded self-adjoint operator of rare class. 


Proof: Since M is bounded so is A$M and since A} M is symmetric it is self- 
adjoint. Further for any weX, |(AjA4A}x, x)| = | (AA}x, Adx)| < || Ade I AAteli 


1 1 
<5 (IABP + 4 Azi) < (Vu, x) where V =z At MM). Since this is true for 


all x and since V is a nonnegative operator of finite trace we may conclude that 
A$AA is of trace class. 


Lemma 3: Let A be a closed, symmetric, densely defined operator with p(D(A)) 
=1. Suppose further that (A+Az1)x, x) > 0 for all nonzero xeR( Ag). Then K = 
AAAI is a bounded self-adjoint operator for which K > k.I for some k > 0 so that 
K has a bounded inverse. Moreover, A+Az (defined on R(A,)) is a closed symmetric 
operator. 

Proof: By Lemma 2, K is a bounded self-adjoint operator. Since AA} is 
bounded, AJA Adis compact and herce K has a pure point spectrum with its eigen-values 
ccnverging to 1. For any veR(A2)((4 +Az1)a, x)= (K Aztex, Aziz) > 0 so that K > 0. 
Thus in order to prove that K > k.I for some k > 0 it isenough to prove that 0 is not 
an eigen-value of K. Suppose for some y #0, Ky = AKAAbhy-y —C. Then 
AXA Aly) =—y showing that yeR(A}) and AAjy ——Agty. If x = Aly, then zeR(A,) 
Aziz = Agty and Ax+Agit = 0. Since v 4 0 and (AH As')a, x) = 9, we have a 
ecnbradiction. Finally we claim that A+ A 1, defined on R(A,), is a closed symmetric 
operator. The symmetry is obvious. Suppose x,eR(A,) for all n and x,> 2, Ax, 
+Azix,2. Write x, = Alyn. Then AKAr,tAg'e,) = Ky,—Akz and since Kis 
bounded, y,,— a limit, say Y. Since AA} is bounded, AAiy, = Ar, tends to a limit 
and hence Agu, tends to a limit. Since Ag? is closed, xeD(Az1) = R(A,). It is then 


obvious that Ax, +Az t£, > Ax+Aziz. The lemma is proved. 


Lemma 4: Let A be a closed, symmetric, densely defined operator with PAD(A)) 
=1 and la (AFA ae 0 for all nonzero xeR(A9). Then there exists a unique 


nonsingular dispersion operator Aj such that A+Az1! = Aj, 


Proof: Since K > k.I we see that ((A+-Az*)x, x) > k(Ag'e, x) for all xeR(A,) 
and since Az! > k’. for k’ > 0 we see that ant any veR(Ag) with Jjejl = 1, ||(A As all 
> |((A+Az42, 212 kk’ > 0. Since a es closed, it follows from this that the 
range of A+Ag? is a closed linear manifold in X, Aj = (A +Az1)-1 is defined on this 
closed linear manifold, and is a bounded linear transformation thereon. We now claim 
that this range is the whole of X. Itis enough to prove that it is dense in X. Suppose 


now for some u, ((A+Ag')*, u) = 0 for all veR(A)). Then QAHASHA yu) —0 


“for all yeR(Ajjand hence (My, u) ——(Agty, u) for all yeR(A})... Since. M(— AN 
_-$21 
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is bounded, this shows that veR(A$) and M*u+-Az*u = 0 so that (M*A3+-D)Az*u = 0. 
Since A} Mis bounded self-adjoint, W*A3+J = K so that KAziu = 0. Consequently 
Az? u= 0 andhence u = 0. This proves that R(A+-Az*))=X andhence Aj isa bounded 
operator on X. As it is the inverse of A+Aj1, Aj is symmetric, > 0 and is nonsingular. 
Since K = AJA A}+T it follows by a straightforward computation that A$KA$ = Np 
Now K— is bounded and hence Aj = A} K-1A} < cA, for some c > 0, showing that 


tr(A‘) < co. This proves that Aj is a dispersion operator. The proof of the lemma 
is complete. ` 


Let Q, be the projection on the subspace spanned by e,,...,€, and let Q, = 
I—Q,. For any linear operator F whose domain includes all the vectors e, we write 
F, for the nxn matrix ((Fe,, ¢;))1<i,j<n. Many properties of the linear operator 
can be reduced to an analysis of the asymptotic behaviour of the matrices F „ as no. 
For example, if F is a bounded self-adjoint operator such that F > a.I for some a > 0 
and F—l is of trace class then |F,„| > 0 (where | | denotes determinant) for all n 
and converges to a finite nonzero limit as n—00. This limit is independent of the basis 
used to compute it. We shall write it as |F| and call it the determinant of F. It is 
easy to show that JFAJ also exists in the above sense and is equal to | F'|—. 

With Aj as defined in Lemma 4 form the matrix C, = (A{-), i.e. the matrix 
whose i—j-th element is (Aj-te,, e) For any yeX we denote by y,, the row vector 
(Ys e1), +++ (Ys €n))- 


Lemma 5: With the above notation 


lim Yn City, = (Ay, y) 
=> 


n 


for all yeX. 


Proof: Fix yeX. We know that two Gaussian measures with mean 
difference A; y and dispersion operator Aj are equivalent and the D}, between these 
two measures is (Ajy, y). On the other hand, if Gja) = (x, Aj” tes), the dispersion matrix 
of Gj, ...,¢, is C, while the row vector of mean differences is ((Y; e1); ... (Y, €,)), so that 
the D} between the two distributions of is wr En ÍS Cz yt. It now follows from 
Theorem 4.1 that y,0;1 y— (Aj y, y). This proves the lemma. 

Lemma 6: Let A be a closed densely defined symmetric operator with 
PAD(A)) =1. Then, 

me (QAQ y, Y) = (Ay, y) 
for almost all y. 


Proof: Since pAD(A)) = 1, it is clearly enough to prove the lemma with A 
replaced by any one of its extensions. The Hilbert space X being real there is at 
least one self-adjoint extension of A.* We may therefore assume that A itself is self- 
adjoint. Moreover any self-adjoint A can be written as At— A- where A+ and A- 


“cf. Stone (1932), p. 357. 
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are > 0, self-adjoint and D(A+) D(A), D(4-)D D(A). It is thus enough to prove 
Lemma 6 when A self-adjoint and > 0. 3 
Let P be the spectral measure of A. For any £ > 0 let F; be the set 
(tije <t<(j+l)e} and A, the operator Xje . Pe, ~ Since J (Ps, Y y) dpay) = 
j 3 pë 
tr (AiPr, Ad it follows that X je. f (Px, KA y)dpay) = tr (Aj. 48) <œ and hence that 
the function y—(A,y,y) is integrable with J (4,y, y)dpa(y) = tr(A24,A$). Now for any e, 


D(A,) = D(A) and |(4y, y)—(A.y,y)| < ely, y) so that Jtr(A$34,A))—tr(A AAB) < 
= |(AHA—A,)Aben, en) | < eX(Aken, Aken) =e tr(Ag) so that J(4y, y)dp,y) = 
n n 


tr(A}AA3)+0(e). Since (4-y, Y) (A y, y) as £— 0 we see that f (A y, y)dp,(y) is finite 
by using Fatou'slemma. Moreover] J (4 y, y)dpaty)— J (4. ¥,y)dpaly)| < £ Ky. y)dps(y) 
= 0(e). We thus see that the function y> (Ay, y) is pj-integrable and 

J (Ay, y)dpaly) = tr (ARAA}). 


Now let n; (x) = (x, es). The 71, N2 -.. are independent under 79. Let E, be the 
smallest o-algebra with respect to which 7,41; %n+9,--- are all measurable. Then 
E, D @ ..., and by the zero-one law, f) @,, consists only of sets which are of measure 

ee — n 


Oor 1. Therefore, by the Martingale convergence theorem, if we have any integrable 
function g, and write g, = Ep, (g|@,,), then g,(y)— J gdp, for almost ally. Take for g 
the function y— (Ay, y). For any n, gy) = (4Q,y; Q.y)+2Q.y, AU—Q,)y)-- 
(I—Q,,)y, (I-Qn)y): Hence, a straightforward computation yields 


galy) = tr (43.4145) + (A(7—9,)y, L—-@,)y) 


Since te (Ab,A, Ab) = È (AAAtes e) tr(AJA AD = S gdp,, itfollows that (A(I—Q,)y, 
Dë n Lë {=i 


(7-Q.)y)— 0 for almost all y. But (4(—Q,y, (—Q,)9) = (dy, +Q, AQ, 9) 
—2(Ay, Qny) = (QAQ: (Ay: Y)HEnly) Where e,(y)90 as m—> 00 for all y. 
We therefore conclude that (Q,AQny: y)— (Ay, y) for almost all y. Lemma 6 


is thus proved. 
Lemma 7: Let A be a closed, densely defined, symmetric operator with 
1 i 
pA(D(4)) = 1 and let gly) = exp [|—F(4y, 01 Then in order that f gdp, < œ it is 


necessary and sufficient that ((A+Az1)2, x) > 0 for all non-zero xeR(Ay). In that case 
Sg(y)dpaly) = IK where K = A} AA}+I and |K| denotes the determinant of K. 
Moreover, for any weX, gjë is integrable and J gly)e™™dp(y) = |K jat 


exp LË (Ayu, u). 


Proof : We shall first examine the conditions for the integrability of g 
Suppose that (A+ Aza, x) > 0 for all non zero weR(A,). For any n let Fly) 


Lemma 6, ex Je (y) | ex Dhe 
—(Q,AQ qy, y). By Lemma 9, exp gja PL —3 (4, y) |= gly) for almost 


4 
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all y. Moreover f f,(y)dp,(y) can be easily checked to be equal to the integral 
J,, where 


1 
J, = (270) "2| Ng, |-# f exp [- 5t4ntAg z] iz 


and z— (2),...,2)). The assumption on A+Az! implies that the matrix Aga 
is positive definite and hence 


: J, = | Ao, JA, Has = lAn AnA +I, | 
so that 
i * 
Since exp SË fa) > 0 and converges by Lemma 6 to g almost everywhere, and since 
Jn tends to | K|-+, it follows from Fatou’s lemma that g is integrable with its integral 
SIKA. 

Conversely let us assume that g is integrable. Let Ex) = (x, ej) and e, 
the smallest c-algebra with respect to which nat Ënjë-.. are all measurable. Let 
My) = gly) exp [ily, u)) and define g, = Ep(g|C,), hy = EpAh|@,). The argument 
given in Lemma 6 may now be used to prove that 9,(y)>Zp,(g) and h,(y)> Eg.(h) 
for almost all y. Since g is integrable and since (Ay, y) = (Az, z)--2(z, Aw) + (Aw, w) 
where z = Qy, w = (I —Q,)y it follows that for almost all w, the function 


z—vexpj — F ((Az, z)+2(Aw, 2)) 


is integrable with respect to the n 
operator Q,AQ,. The integ 
—A,tAgi is positive definit 


-dimensional normal distribution having dispersion 
rability for even one w implies that the matrix C, 
e. A straightforward computation now yields 


InlY) =| Agn|-#|C,|-* exp UA Ua) +2052 | 


where y,, = (I —Q,)y, Zn is the row vector whose i 


-th component is (Q,,Ay,, e). A 
similar computation on h yields 


Pal) = |Aonl #10q|-*x exp J— (dy, ynin 0) pe, 0 


1 
Së ty Q- 
Ta u, Cu —iu,C 12] 
where u, is the row vector wi 


th i-th component (Qn u, e:). 
for g,, and h,, we get 


Comparing the formulae 
old) qafë) oxp (— 1.05406] exp [ily t)i Oret). 
Let us write J for J gdp, and J(u) for J hdp. 


Since |}, 
all y, we may conclude that g,,( 


(y) | | J(u) |as n— co for almost 
y) exp [- gu, Otut, 


] converges to | J (u)) for almost 
> J for almost all y it follows that u,C,,1ut, has a limit for each veX. 
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There exists a unique bounded self-adjoint operator H n on X such that 
(Hn u, u) = u, Cz rut, for all wu. Since lim (Ha, te) exists for all weX it follows from 
nJ %0 


a well-known theorem that ||H,|| E £ for all n, k > 0 being a finite constant. This 
shows that 0 < O71 < kJ, for all n(Z,, being the nxn unit matrix) and hence that 
C, Dj, for all n. We thus deduce that A$, C,A$, > Lg, for all m If K 
denotes the bounded operator AJA Aj+TI, this means that K ke > Ae, for all n and 
1 A a z 5 s 

hence K > x Ae Since A, is non-singular, this inequality implies that Ky = 0 only 
when a= 0. Since K has a pure point spectrum with non-negative eigen-values 
which converge to 1, this implies that K is invertible and hence K > k’.I forsome k’ > 0 
from which we may conclude that ((A+A3*)x, x) > 0 for all non-zero xeR( Ag): 
From Lemma 5 we know that u, Cu (A'u, u). 


We shall now evaluate the limit of h,(y) for each fixed u, Since g,(y)> J 
and wu,,O71ut,—> (Aju, u) it follows that exp [i(u, y,)—tw,Cz124] has a limit for almost 
ally. Since (xi, ¥)— 0, it follows that exp (—iu,C712t) has a limit for almost all y. 
If we write £,(y) = —u,C' zë then 2, is real linear in y and expfi£„(y)] has a limit 
for almost all y, the limit being independent of y (= J/] J(u)| in fact). Such a limit 


1,4: E 
must necessarily be equal to 1 and hence h,(y)>J expj gl 14, u) . Since K > LI 


and since K—l is of trace class,|K,] = | Ai, C,Ad,| = |Aon|-|Cn]—|K] so that 
Ag tj || KI. Since (AY ns Yn) = (Ay, yY) +Q, 4y: Y)—2Yn, Ay) ve may 
conclude from Lemma 6 that (Ayn, Yn)—> 0 for almost all y. We thus finally see 
since z,0z1 24, > 0, that J-> | K| and hence that g,(y)>|K|-* and h,(y)> [K] 


exp [—1Aiw, u)]. Lemma 7 follows at once from this. 


Remark. It might be of interest to illustrate Lemma 7 with some examples. 

Let X = 72(0, 1) and A, the integral operator with kernel k(s, t) = min (s, t). Pa then 
is Wiener measure with variance parameter 1. Let A> 0 and A =—À.L. 

A simple computation shows that the eigen-values A,, Ag, ..., of Ag are all simple 


and A, = 4r—(2n—l)2. According to Lemma 7, the function z> oxp| 5A, a)| is 
integrable if and only if —A(, x)+(Az} x, x) > 0 for all non-zero xzeR(A,). We claim 
that this is equivalent to the condition A < A; for alln. If x= e,, then —Ma, v)-- 
(Azt a, x) > 0 for «=e, gives À < Àn. On the other hand suppose A < Az! for all 


n. Then for any # = E tye, € R(Ag), —A(e, %)-+(Az? v, x) = X(—A+AzNe? > 0 with 
n 
equality attained only when x, = 0 for all x. Lemma 7 then gives the value of the 


integral as 


[abet tl =| D [gay] T = ee a 
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A sh being the condition for integrability. Lemma 7 thus yields the formula 


Sex| af arnat | dp,(x) = (sec AKA: < 7/2) 


(ef. Gelfand and Yaglom (1960) and Silov (1963)). 


As another example let us consider the function 


fp: 2> exp| E f ooed] 


where à > 0 and p is a continuous non-negative function on (0, 1]. Let m(A) 

—inf p(t) for any closed interval A C [0, 1] and let us write op = sup ( [A]? m(A))([A] 
tea = A 

denotes the length of A). Lemma 7 can then be used to prove that f, is integrable 

if and only if the condition 


is satisfied. When p(t) = l, op = l leading to the first example. When p(t) = t? 
the condition becomes A < 47”. 


We are now in a position to formulate and prove our second main theorem of 
this section. 


Theorem 6.2: Let p, and p, be two equivalent Gaussian measures with means 
0 and (non-singular) dispersion operators A, and Ag. Let L = log (dp,jdp,). In order 
that L be a quadratic form it is necessary and sufficient that R(A,) = R(Ag) and that 
(Az*—Ag*)A} has a bounded extension M to the whole of X with tr(M*M) < œ. In 


this case the closure A of Az 1—Agz? exists, is a closed symmetric operator with p, (D(A)) 
=1, and for almost all x 


Teja Zur, 2)—4 log | S| 


where S is the operator satisfying the equation A, = Ak SAL. 


Proof: Suppose first that the conditions on Aj and Ag, specified in the theorem, 

Be satisfied. Ay+—Aj? is then densely defined and since it is symmetric, its closure 

exists. Let A denote this closure. The finiteness of tr(M*M) implies that 5 || A Age, JË 
n 

<o, and hence by Lemma 1, p{D(A))=1. Since A+Az1 = Az} on R(A,), the condi- 

tions of Lemma 7 are satisfied with Aj = A}. 


1 
. Consequently exp J— 5 (Az, v) | is an 
integrable function with | 5 | 


1 e 
J exp |- 4e, z)+ilz, u) dpa) = | K|- exp [-4 (A, u, )) 
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for all u. Now K = AZAA$+I and hence Ke, = A(Ajt—As')e, te, = A, Arte, 
=e, for all n. This implies that K = S and hence that |K|-?=|S|#. We 


thus conclude that 


| 8|-#f expJ— 5 (4%, a) Hite, u) | dpa) = exp [—3 (A u w). 


This shows that dpyjdp is the function x | S8|-? exp| —5 (der, x)| and leads to 
the result that 
Le) = —1 (Aa, 2) —+ log | 8]. 
2 2 
Conversely, let us assume that Lis a quadratic form. Then there exists 
a closed, symmetric, densely defined operator A with p(D(A))= 1 such that 


L(x)= —Hae, x)+c for almost all x,cbeing a constant. This implies that the function 


“— exp [—5(4* 2) Jis integrable with respect to po. From Lemma 1 we infer that 


M = AA} is everywhere defined and tr(M*M) < co. Moreover R(A,) C D(A) and hence 

A-++Az!is defined on R(A,). Lemma 7 implies that AH Ag —Aq' where Aj is a non- 

singular dispersion operator. Evidently R(A)) = R(Ag). Theorem 6.2 will be com- 
pletely proved if we show that Ay = Aj. From Lemma 7, we have, for all w eX, 
1 : l,a: 

f oxp |[—4 (de 2)+ile, u) dpe) = 181+ oxp [—-5lAiu a]. 


Since L(x) =—4 Ue, x)--c and since 
f exp [= HU, 2)] dp,(x) = | S|, 


we conclude from the equation f exp L(x)dp,(x) = 1 that 


c=— 5 log ISI 


and hence that 
f exp IL(e)--ite, w)] dpate) = exp [~ Aiu, u). 


Since L = log dpyldpa, the integral on the left is equal to exp [—}(A,w, u)]. This proves 
that A, = Aj and finishes the proof of Theorem 6.2. 


Corollary : Jn order that L be almost everywhere equal to c+-q where c is a cons- 
tant and q the quadratic form associated with a iain self-adjoint operator it is necessary 
and sufficient that R(Ay) = R(A)) and AD —As be dul operator on this linear 
manifold. If A denotes the bounded extension of Aj'—Ag' to X, then A is a bounded 


self-adjoint operator and 
La) =F (de, as log l Ki |. 
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Remarks: (1) It may be noticed that the conclusion that L is a quadratic 
form is symmetric between p, and p, while the conditions derived on A, and A, do not 
exhibit this symmetry. But this is only an apparent difficulty. Suppose in fact 
that R(A,) = R(A,) and Ay1—Az! has the closure A. We claim that N = (—A). 
Aj is an everywhere defined bounded operator with tr(V*N) < œ. Since Aj = 
AZSA}, At = Ad SHU where U is an unitary operator and hence N = —AA3S?*U which 
is everywhere defined. Moreover N =—MS?U so that N*N = UASM“MSU. To 
prove that tr (VËN) < co it suffices to prove that tr (S?*IS!) < co and for this it 
is enough to find an orthonormal basis Sis fa ..., of X such that x ISHKMSYË,I < 0. 


Since tr(S—I)? < co, S has a pure point spectrum and hence we can choose an ortho- 
normal basis f,, fa, -.. for X such that Sf = Safn for all n, s, > 0a constant. Since 
X(s,—1)? < 0, s,$ 1 as n—> oo. Consequently © || St M*IMS#f, |? < || SIl 
n 


n 
Z | MMSE ISHT) AMY, jë <co since s,— 1 and | MMS è = te(M*M) < o. 


(2) It might be interesting to notice that even the assumption that S—l 
is of trace class need not imply the conditions of Theorem 6.2. We shall now describe 
an example where S—T is of trace class and yet R(A,) does not coincide with R(A,). 


Let A,, Àz, ...>0 be chosen so that © AK Të” Let e,, eg, ... be an orthonormal basis 
n 


for X and let Age, = Ae, for all n. We define a bounded operator S by setting 


(1+-2A#)e,+Agegt+... ifj—1 
Se, = 
Meje, if j> 
TË = vet... e, is a vector of norm 1, then 
(Sax, a) = (it. pat) leAt... HAE), = 1+ 2a,(a,A}+...+2,A3) 


and since amA}. ten) | al È Ja KË) < 1/4 
ja 1 


we see that (Sx, x) > 4 for all such x. 


This shows that S is a bounded self-adjoint 
operator with SË LI. Moreover, 


2A Aje... iKj—l 
Sy = { Je, +e, I 
Aja, ifj>1 


from which we may conclude, that S—TZ is of trace class. If we now write 


f Aall+2Ahe tA} E Aye, ifj—l1 
Aye = : k22 


- AL AeA fjel 
— At : 8 
then 4 = AZSA} 1s a dispersion Operator which is non-singular and the Gaussian 
measures with means 0 and dispersion. operators Ay and A, are equivalent. In this 
case Bly) % RIA). In fact eg R(A;) 


- For, stippose %,%,... are constants with 
“~ 828 
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X a? < co such that Altet...) = 6. Using the definition of Aj we get the ena 
Ara HAr = 0 (EX 1). 8 

The above set of equations imply that a, = —Aia, for all k and hence that a, = 0 
for all. This is a contradiction. In terms of matrices, Ag is a diagonal matrix with 
diagonal entries A,, Ag... While A, is the matrix (ay) where a = Ay (1+223), Oyj = Ay 
= Aj A;(j > 1) and ai; = A; dj (i j > 2). 

(3) All these considerations simplify considerably if A, and Ay commute. 
In this case we can choose an orthonormal basis ĉr; , ... for X such that Aje,, = pen, 
Age, = Ae, for all n, A, > 0 and Ha > 0 being constants. The condition for equi- 


2 
valence is now X eee ) =H. IJË ( E ) en for all n so that S—T is of trace 
n n 


LI 
class if and only if X E —l) goc. Since E 1 it follows easily that RA) = R(Ag) 
n n n 


and that the closure of A,—A, is the unique self-adjoint operator A for which Ae, 
1 1 x ; 1 1 ya 

= ( a ) c, for all n.D(4) consists of all xeX for which 2 (= je ) (x, e) Eco 

1 I 


2 
and this has probability one if and only fz ( ER ) A,< © which is also the condi- 


En 
tion for L to bea quadratic form. This however need not always happen. If A, = Un? 
si | 1 1 \2 
and y, = 1/n?+-n! then X | KI) < 00 but X (2 —— ) A, = 0. In other words 
n An 1 n An ja 


n 
even though R(A)) = R(A,) when AAs = A,A,, the conditions given in Theorem 2 
need not be fulfilled always. 

We shall finally take up the discussion of the general case when Pi and p 
are equivalent Gaussian measures with means m, and mz and dispersion operators 
Aand Ay. Let A = }(Ay+<Ay). Then mj—m, = deR(A') and A, = A} SA} for some 
bounded self-adjoint S with S > k.I for a constant k > 0 and tr(S—T)*\< co. ite 
is easy to prove that there are bounded self-adjoint operators S, and S, with 
S1 > hI, Sp > ky and tr(S,—J)? < cand tr(S.—I)? < co satisfying A=A}S,Atand A= 
AZSA}. Tt follows from this that R(A}) = R(A}) = R(A!). If we now write q) and q, 
for the Gaussian measures with the same mean m = (mi +m) and dispersion operators 
A, and Ag, then it is an easy consequence of the above remarks and Theorem 5.1 that 
Pi: Pa, qı and qz are all equivalent. We are now in a position to prove our final theorem. 

Theorem 6.3: Let pı and p, be equivalent Gaussian measures with means 


i 1 
Mı, Ma and dispersion operators A, and Ay. Let m = 5 (MHM) and $ = Mı — mg. 


Suppose that (a) R(A,) = R(Ag) (b) (Ay!—Agz1)Ab extends to a bounded operator M defined 
over all of X with tr(M*M) < œ (e) deR(A,). Then the closure A of A7'—Az! exists, 
is a closed densely defined symmetric operator such that PAD(A)--m) = 1, and 


L(x) =— + (A(e—m), (w—m))+- 3 (x, Az! 8+.A51 6) 
l (m, Az? d--Agt 8)— 10, 45 À 
= 5 (m AG +Ag* ô)— z © 46)— > log IS) 
for almost all x. 
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Proof: Since py, pa, qı and qa are all equivalent we have 
L = log (dp,/dq,)+1og (dq,/dq2)—log (dps/dq2) a 
pı and q, are Gaussian measures with means m, and m and dispersion operator Aj. 
1 
Moreover m,—m = 46eR(A,). Hence by Theorem 6.1 


“NË ae ee 
log (dp,/dq,)(«) = 3 (x, Aq” 8)—(m, Az? 0) al A716) 
for almost all x. Similarly, 


1 = 
log (dpaldg)(2) — — (e, Az? 8)-+4(mm, Ag? 8)— (8, A71) 


for almost all v. Now since dq,/dq,(x) = dqi/dqx(e—m) where qj and ga are Gaussian 
measures with means 0 and dispersion operators A, and A, Theorem 6.2 enables 
us to conclude that the closure A of Aji—As' exists, is a closed, symmetric densely 


defined operator with qe( D(4)) — 1. Consequently g,(D(A)-+-m)=1 and for almost all x 


log (2) @=— 5 e-m, (v—m))—4 log |5] 


Combining all the three expressions we obtain the required formula for L. 
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LESS VULNERABLE CONFIDENCE AND SIGNIFICANCE 
PROCEDURES FOR LOCATION BASED ON A SINGLE 
SAMPLE : TRIMMING/WINSORIZATION 1 


By JOHN W. TUKEY and DONALD H. McLAUGHLIN 
Princeton University! 
SUMMARY. The vulnerability of Student's t, insofar as efficiency and power are concerned, 
leads to consideration of substitutes. Among the most promising are ratios of trimmed means to square 


roots of suitable quadratic forms involving the same order statistics. Matching, across underlying distri- 


butions, of ratios of average of denominator to variance of numerator leads to selection of the Winsorized 


sum of squared deviations as the basis for a denominator. The resulting trimmed t should prove more 
useful when the amount of trimming is made to depend on the individual sample in a suitably prescribed 
manner. Exact critical values for the resulting tailored t seem to require Monte Carlo computation, but 
use of a simple modified denominator for trimmed ¢ allows us to use the conventional ¢ tables as a reasonable 
approximation. 

1. INTRODUCTION 


One of us (Tukey, 1962, p.16) has already summarized the advantages of 
the class of symmetric distributions as a natural first step in our progress from 'statisti- 
cal techniques understood and known to be useful for Gaussian (= normal) distributions 
alone to techniques understood and useful in very much more general situations. 
Once we are firmly established with statistical techniques understood and known 


to be useful for symmetrical distributions, it will be time to take a further step. But 


the problem of adequate mastery of the symmetrical-distribution case is enough 


for the moment. 

Indeed it seems enough for the present to confine ourselves to techniques 
which are not only symmetric in their action on the class of all samples, but are sym- 
metric in their action on individual samples. Such a restriction is clearly both less 


important and more easily removed than the restriction to symmetrical distributions. 


The present account confines itself further: (i) to the single-sample-for- 
location problem and (ii) to the first steps of a specific approach to that problem. 
It describes the results of certain easily accessible calculations, and outlines what appear 
to be the plausible next steps, as well as indicating a little of what is known, or believed 


to be true, beyond this scope. 


9. THE NORMAL AND THE PATHOLOGICAL : WHICH IS WHICH ? 


The Gaussian or Laplacian distribution, to the physicist the Maxwellian distri- 
bution, has long been known to the statistician as the normal distribution. How- 
ever little noticed such a commonly-used name becomes, shades of its original meaning 
continue to cling—distributions that are not normal are, by at least: slight implication, 
pathological. In the early stages of development or exploration of a particular aspect 


1 Research supported by U.S. Army Research Office (Durham). Reproduction in whole or part 
for any purpose of the U.S. Government permitted. 
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of statistics or data analysis, such an attitude may promote progress. But in 8 well- 
developed area such an attitude can only be respectable if it reflects the facts—if the 
usual is at least close to the “normal” in behaviour. This is not the case in almost all 
of the instances of data analysis which the writers—and, they believe, most practising 
statisticians, have come in contact. The typical distribution of errors and fluctua- 
tions has a shape whose tails are longer than that of a Gaussian distribution. (See 
Tukey (1960) for more extended discussion, and Mandelbrot (1960, 1961, 1962) for 
newer instances arising in economics.) 


It is the Gaussian distribution that has to be regarded as somewhat patho- 
logical from the standpoint of practice. And distributions with shorter tails, while 
they do occur, are rather more pathological. Thus frequency of occurrence directs 
our attention to longer-tailed distributions. 


There is another, quite distinct and independent, reason for emphasizing 
long-tailed distributions. The prevalence of minimax-loss approaches to uncertainty 
is not an accident. We all tend to have more interest in avoiding a large loss than in 
obtaining a large gain. If procedures optimum for Gaussianity are used against long- 
tailed distributions they tend to behave poorly, both relatively and absolutely. Their 
quality is usually not only far from optimal (for the specific situation) but also of far 
lower quality than would have been the case if the underlying distribution were normal. 


Against short-tailed distributions, on the other hand, procedures optimum 
for Gaussianity are not infrequently relatively poor but absolutely good, in the sense 
that, while the optimum procedure for the specific situation would do much better, 


the performance of the Gaussianly optimum procedure will be better for short-tailed 
distributions than for Gaussian ones. 


The use of s$ as an indicator of scale is, of course, an outstanding instance of 
the behaviour just discussed, both for short-tailed and long-tailed distributions. 


Accordingly, it seems appropriate to begin by giving major attention to sym- 
metric distributions with tails longer than the Gaussian. 


3. MEASURES OF QUALITY AND INSENSITIVITY 


Statistical procedures are identified among the more general procedures of 
data analysis, which themselves ma: 


y or may not be based on a probability model, 
by the fact that they take explicit account of uncertainty. From a narrow view- 
point, the most important aspect of such a procedure is its validity, the extent to which 
any associated statements of probability are correct, or at least conservative. Does 
the nominal si 


gnificance level teally apply? Does the formal confidence interval 
have (at least) the asserted probabilit 


E y of covering the true value? When such 
questions are asked about the behaviour for other underlying distributions of a proce- 
dure calibrated for Gaussianity, the conventional term is “robustness.” If we wish 
to be clear and specific, we should—and shall—speak of “robustness of validity,” 
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But only a slightly broader view causes us to ask of a procedure not only 
“Is it valid?” but “Is it efficient2”. While controlling its rate of error, does it do as 
well, say as much, extract as much from the data, etc., as it can? And when we ask 
such questions about the behaviour for other underlying distribution of a procedure 
first developed for (near) Gaussianity we are asking about “robustness of efficiency.” 

So long as we are to continue to use Gaussian underlying distributions as 
the standard of calibration and the natural starting point, thus assuring validity in 
the Gaussian situation,—and this seems likely to be a long, long, time—these arguments 
suggest that we should give major attention for the present to 

(1) efficiency for Gaussian distributions, 

(2) robustness of validity for long-tailed symmetrical distributions, 

(3) robustness of efficiency for long-tailed symmetrical distributions, where 
the first probably deserves by far the least attention of the three. 


4, LOCATION FROM A SINGLE SAMPLE : COMPETITORS AND CHALLENGES 


If we are given a single (random) sample, 41, Ys, ..., Yn Of observations from 
dF(y—p) where F(v)--F(—v) = 1 (so that the distribution of y is symmetric around 
u), a statistical technique which permits tests of significance also provides confidence 
statements, and vice versa. 

The most classical technique for this problem is one of many sorts of uses 
of Student’s ¢. Its rather moderate robustness of validity has been studied by a 
number of workers (Pearson, 1929; Rietz, 1939; Gayen, 1949; Bradley, 1952)., in 
two-sample and simple analysis-of-variance situations Student's tis much more robust.) 
What we know about its behaviour can be summarized as follows : 


(al) The average value of the square of its denominator is in fixed ratio 
to the variance of its numerator, independent of the underlying distribution. 


(a2) Its robustness of validity for symmetric underlying distributions is 
moderate, being quite high for significance or diffidence levels of 30-40% (Gayen, 
1949) but not as satisfactory for the usual 5%, 1%, ete. levels. (Its behaviour for un- 
symmetric underlying distributions is much less satisfactory.) 


(a3) Contrary to most naive intuitions, confidence and significance will Te 
over-estimated by Student’s t, not when the underlying distribution is longer-tailed 
(as Gayen, 1949: Bradley, 1952; and Wonnacott, 1963 all agree) but rather when the 
underlying distribution is shorter-tailed (as Rider, 1929; Perlo, 1933; La dettan, 
1939; and Gayen, 1949 all agree). 

(a4) Its robustness of efficiency is subject to serious question, especially since 


a single wil d-appearing observation can seriously affect both ğ and s. 


(a5) The method of its calculation can easily be extended (or analogized) 


wide variety of situations without requiring changes in (G 


to a very aussian-theory) 


critical values for this reason. 
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(a6) It provides confidence limits with little more effort than significance 
tests. 


(a7) If the underlying distribution should be Gaussian, these procedures 
will be optimal according to almost every criterion. 


Toward the other extreme we find techniques which can be based on ordering 
deviations (of y's from a contemplated central value M ) according to magnitude and 
basing the test (or confidence interval) upon the pattern of signs of deviations, which 


we will call the sign-configuration. This is most frequently and simply done by sort- 
ing the ranks of one sign in some simple way. 


Scoring each rank with its rank number is called the one-sample Wilcoxon 
or signed-rank procedure and was introduced by Wilcoxon (1945, 1946, 1947, 
1949). (For more available expositions see Moses, 1953; or Siegel, 1956, where the 
procedure is applied to differences of paired observations.) 
(1947, 1949) demonstrated a result equivalent to the fact that the probability of ob- 
taining any configuration of signs of deviations (when the deviations are ranked by 
magnitude) is the same for all symmetric underl 


ying distributions. (For a clavifica- 
tion of the relationship of his results to Wilcoxon’s see Walsh, 1959.) 


In his thesis, Walsh 


The most important aspects of our knowledge of such sign-configuration proce- 
dures can be summarized as follows : 


(b1) Their robustness of validity is perfect for 
(b2) 


(b3) 
to more gener 


symmetric distributions. 
Their robustness of efficiency has not been adequately studied. 


The method of their calculation does not seem to be trivially extendable 
al situations (such as regression coefficients); new t 


ables of critical values 
Seem almost certain to be needed in any such extension. 


(b4) The calculation of confidence intervals requires appreciably more effort 
than significance testing, althou: 


gh trial-and-error is not required (Mos 
graphical approach sufficing. 


es, 1953), a simple 
(In more general situations, trial- 
to be necessary.) 


and-error seems likely 


(b5) If the underlying distribution sh 
various of these procedures will be very high ( 
comparison with the “optimum 


ould be Gaussian, the efficiency of 
Klotz, 1962), the loss of efficiency in 
ing almost negligible in this situation. 
the single-sample problem 
ciency for some sign- 
rgue that such a proc 
Its only major defect would be it 


» to cause its users to stop with signific 
where it would be profitable for them to push on to con 


If we are to seek a new procedure 
relative advant 


” t-procedure be 


If we were only concerned with 
settle the question of robustness of effi 
favourably, it would be reasonable to a: 
choice for routine work. 


» and were able to 
configuration procedure 
edure was a reasonable 
s tendency, because of 
ance tests in many instances 
fidence intervals, 


» it should be one combining m 


any of the 
ages of t-procedures and sign-configuration procedures. 


As an ideal, 
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possibly utopian, we might seek a procedure with these properties for symmetric 
underlying distributions : Ý 

(el) Its robustness of validity is high. 

(c2) Its robustness of efficiency is satisfactorily large. 

(c3) Its method of calculation can be rather easily extended to a wide variety 
of situations with little change in critical values. 

(c4) It provides confidence intervals almost as easily as significance tests. 

(c5) Ifthe underlying distribution should be Gaussian, its efficiency is high. 


If we could find a procedure which met, or came close to meeting, these speci- 
fications, it would be an outstanding candidate for adoption as the method of choice 


for routine use. 


5. A DESIRABLE DIRECTION OF EXPLORATION 


The criteria toward which we hope to make progress are diverse in kind 
and character—it would be unrealistic to expect any formal optimization procedure 
to actually lead us toward our goal. Accordingly, we have a choice between trying 
to modify the things we understand, or seeking to be struck by the lightning of a 
purely new approach. While waiting for lightning, we may as well proceed with 
modification, which appears to be more easily accomplished starting with Student’s £. 

When reached through a formal optimization procedure, Student's ¢ arises 
as a single, integrated creation—and thus offers little guidance for modification. 
But Student”s £ did not first arise in such a way. Its numerator and denominator have 


very different conceptual origins, namely : 
(di) The numerator came first, as an intuitively effective point estimate 


of deviation from contemplated value. 

(d2) The denominator followed, as something which reflected (indee i ës 
an estimate of) the variability of the previously chosen numerator. 
One road—to the writers the currently most promising road — toward the seal 
set out in (el) to (cd) is thus to begin by choosing a modified numerator, and then to 
seek a matching denominator. 

In the case of Student's t, where attention was concentrated on an klenë 


Gaussian distribution — as Vas wholly appropriate when breaking new ground— 
“matching” needed only to refer to this situation. The fact that the denominator 
continued to estimate the variance of the numerator for all distributions was only a 
bonus, albeit one that proved to be very important. In the present approach, 


“matching” must refer to at least a modestly wide variety of symmetric distributions. 
Granted that both frequency of occurrence and intensity of danger should 
cause us to give particular attention to longer-tailed distributions, our modification 
of the numerator must lie in the direction of attaching less weight to extreme—more 
eme-appearing—observations. 
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6. TRIMMING AND WINSORIZING 
There are two sorts of simple modifications of an arithmetic mean which 
especially deserve consideration in this context, both for reasons of simplicity and 
for reasons arising from analyses of mathematical models to be reported elsewhere. 


Given n ordered observations 


Yi S Y2 Se KYn “i (1) 
the (unweighted or equally weighted) arithmetic mean 9 or y. is given by 


Ye => (HYH tn) = DYJEL e (2) 


TË n = g-rh--g (this mode of expression is chosen instead of n = 2g-++h to stress order- 
ing), the g-times (symmetrically) trimmed mean yz, is given by 


1 
Yn = ny Ysa tYorat +HYn-) = oy yl ZI «+ (3) 


and is the arithmetic mean of the set of h numbers obtained by dropping both the g 
lowest and the g highest values from the y;. Clearly yp, pays less attention to extreme 
values than does y,, which may be regarded as the 0-fold trimmed (= untrimmed) 
mean. 


A less intuitive contender is the g-times (symmetrically) Winsorized mean, 
Yw given by 


1 
Ywa = E (CY PY aa t Yot- FYn-g t9 Yn) — 2 yl x1 seid (4) 
(Wo) (Wo) 
which is the arithmetic mean of the n values obtained by replacing (i) each of the g 
lowest y's by the value of the nearest other y, namely Yg and (ii) each of the g highest 


y’s by the value of the nearest other y, namely y,_,. Again we have paid less atten- 


pongo individual extreme y's, but we have managed not to divert our attention from 
ne tails of the sample so thoroughly. Instead of replacing each deleted y; by Yro 
which is one reasonable interpretation of the calculation of Yrp since 


1 
Yr = n (Yro tY tY t- +HYn-0 t9 Yro) =, (9) 


i dë on replaced each deleted y by the nearest retained y. (As noted by Dixon 
: J hig procedure has been called Winsorization in honour of Charles P. Winsor, 
who actively sponsored its use in actual data analysis.) 


WwW : : ` 
thi Pishë find it convenient to continue to use the notation just illustrated 
roughout the discussion that follows namely 
el) Repl lg ee hs. 
G eet x a Pubsaript by a “ə” indicates a simple arithmetic mean. 
(e3) Am n : eet by “Tg” indicates a g-times trimmed mean. 
r ent of a subscript by “Wg” indicates a g-fold Winsorized mean. 


(e4) The indication “HEGYE 
> y 9)” on a summation sign indi i i 
entamsfioh— summation. over the RË ign indicates (g-times) trimmed 


trimmed g times on each tail, ts remaining after the sample of y's is 
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(e5) The indication “(Wg)” on a summation indicates (g-times) Winsorized 
summation = summation over the subscripts for which the y's were not deleted, 
repeating each extreme undeleted subscript g+1 times and each other subscript once. 


If we were concerned with but one amount of trimming or one amount of 
Winsorization we could well make use of simpler notations. (One such has been sug- 
gested at the end of (b2) on page 12 of Tukey, 1960.) But when concerned with several 
values of g it seems advisable to use a more explicit notation. 


7. CHOICE OF NUMERATOR 


For underlying distributions whose shapes are very close to Gaussian, the 
Winsorized means are less variable than trimmed means. While the efficiency for 
Gaussianity of trimmed means is quite high, the fractional loss being crudely 2g/3n 
(corresponding to efficiency of about 2/3 for the median), that of the corresponding 
Winsorized means is much higher. At the other extreme, where very long-tailed distri- 
butions are involved, trimmed means are clearly more efficient than Winsorized means. 
Where does the transition take place ? 


At the time when a previous account was written, preliminary analysis suggested 
a moderately broad scope for the Winsorized mean (cf. Tukey, 1962, p. 18). Now 
that further analysis (to be reported elsewhere) has been carried out, it would appear 
that the trimmed mean is likely to be more widely useful than had been supposed. 
As we shall see, this change in interpretation makes the present programme more 


attractive. 


8. CRITERIA OF MATCHING 


In striving to “match” denominators to a given numerator, we must choose 
a criterion of matching. The natural, and we believe reasonable choice, is to begin 


by following the example of Students t and ask that 
average value of denominator squared 


and variance of numerator 


should be in constant proportion over as broad a spectrum of symmetrical distributions 

as is reasonably convenient. This is, again, only a first step. Once we find a denomi- 

nator which matches well in this sense, we are ready to calculate critical values of the 
a 


ratio for various symmetrical distributions, and learn whether its validity is really 


robust. 
The numerators we consider are linear combinations of order statistics; their 


variances will be linear combinations of variances and coy aaco of order statistics. 
And the squared denominators are likely to be quaseatic functions order statistics; 
age values will depend on averages, VOTË, and covariances of order 

‘Accordingly we naturally begin by turning to those symmetrical distri- 
ow moments of order statistics are available. 


their aver: 
statistics. 
butions for which 1 
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Three distributions are outstanding in this regard : 


(£1) the rectangular distribution, for which low moments are available in closed 
form, 
(£2) the Gaussian distribution, for which Ist and 2nd moments are available 


for sample sizes < 20 (Teichroew, 1956; Sarhan and Greenberg, 1956; see alternatively 
Teichroew, 1962), 


(f3) one long-tailed distribution—the lambda distribution with A ——0O.1 
—called the “special distribution” by Hastings, Mosteller, Tukey, and Winsor (1947), 
for which they provided 1st and 2nd moments for sample sizes < 10. 


In addition to these distributions, published tables of low moments of order 
statistics from symmetric distributions seem restricted to 


(f4) the isosceles triangular distribution (Sarhan, 1954, p. 320) for sample 
sizes up to 5, and 


I (fs) the double exponential distribution (Sarhan, 1954, p. 320) for sample 
sizes up to 5. 


In view of the central importance of the Gaussian distribution, and the ease 
of handling order statistic moments for the rectangular distribution, the natural course 
is to begin by trying to ‘‘match’‘ for these two distributions and, once reasonable success 
is obtained, to check the match for the other distributions, giving special emphasis to 
the match for distributions longer-tailed than the Gaussian. 


Work on the preparation of extensive tables of order statistic moments for 
lambda distributions is in progress. When these are available, somewhat better 
checks on matching will be possible. However, since the ultimate check is in terms 
of % points of the ratio rather than in terms of comparing individual values of numera- 
tor and denominator, the need for such further checks is not great. 

9. DENOMINATORS MATCHED TO THE TRIMMED MEAN : BARLY TRIALS 

The denominator most naively associated with the trimmed mean, Yrg: is 
(some multiple of ) the formal standard deviation of the trimmed sample, whose square 
18 proportional to the (g-times) trimmed sum of squared deviations (SSD) 


SSDry = È (y;—yz,)?. SENG) 
(To) 
Tho most convenient of the ratios that must be nearly constant if we are to have match- 
ing is 
ave SSD, Wes 
oS ao GEE 
var Yq) T Sivison(g-+h-+g) (7) 


where the name “divisor” is justified by the fact that 


a a D 
divisor, (g--h-{g) ox 
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is the natural normalization to an actual denominator for use vrith Yry. When we 
investigate the behaviour of 


Gaussian divisor, (g+h-+g) 
rectangular divisor, (g--h--g) 


(9) 


where “Gaussian” and “rectangular” specify the distributions for which “ave” and 
“var” are calculated = the underlying distributions for which the appropriate forms 
of (8) yield denominators the average of whose square equals the variance of yrç, 
we find that (9) is moderately, but probably not satisfactorily close to unity, as Table 1 
shows. For more detail, see McLaughlin and Tukey (1961). 


TABLE 1. VALUES OF THE RATIO (9) FOR SELECTED g AND h 


g=number of observations trimmed from each end 


h—size of 
KA 8 5 E z - 
pa A ee 
2 1.009 1.007 1.005 1.003 1.002 1.001 
3 1.002 1.002 1.002 1.001 1.001 
4 995 996 997 998 999 
5 987 989 .991 -995 
9 .966 ,967 .971 .979 
10 .963 962 967 976 
15 951 947 
18 .948 


Reflection upon these results showed that, for better matching, the denomi- 
nator should be modified in such a way as to give more attention to the outlying por- 
tions of the sample. This led to trials of various alternatives such as using SSDyy., 
where g* < g, with yzy so that a less-trimmed sample provided the denominator, 
Trial and consideration eventually led to investigation of 


SSD = Z (yw) E CO) 


the equally-many-times Winsorized sum of squares of deviations as a basis for a 
denominator to be used with the trimmed mean yry. 
The matching of the corresponding divisor for different distributions 


divisor, (g-+h+g) = S 5 


(11) . 


has to be examined. The first check is again of 


Gaussian divisor, (g-+-h-+-g) E 
rectangular divisor, (g+-h--g) = (12) 


with the results shown in Table 2. 
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TABLE 2. VALUES OF THE RATIO (12) FOR SELECTED g AND h 


h—size of g—number of observations trimmed from each end 
fue 1 2 3 5 8 9 
2 1.0092 1.0071 1.0053 1.0031 1.0017 1.0014 
3 1.0052 1.0045 1.0036 1.0023 1.0013 
4 1.0037 1.0033 1.0027 1.0018 1.0011 
5 1.0031 1.0023 1.0021 1.0015 
9 1.0026 1.0021 1.0015 1.0009 
10 1.0026 1.0020 1.0015 1.0009 
15 1.0024 1.0020 
18 1.0023 


The line for h= 2, where only the 2 central observations are retained in either 
the trimmed or Winsorized samples, must be the same in Tables 1 and 2. Elsewhere 
in Table 2, only one entry rises as much as 0.5% above unity. (As compared with a 
fall of more than 5% for the t-denominator.) And since divisor,(g--h--g) will appear 
under a square root, this corresponds to a suggested difference in critical value between 
rectangular and Gaussian of 1 part in 400. (As much as almost 1 in 200 when only 
the two central values survive trimming.) For the direction of easy computation but 
of lesser importance, the direction of shorter tails than Gaussian, the suggested be- 
haviour of divisor, (g+h+g) is close to excellent. What of the other side 2 


Table 3 presents the available values for 


special divisor, (g+-h-+-q) 
rectangular divisor, (g-++h+-g) 


(13) 


where “special” refers to the lambda distribution with A = —0.1, (cf. Hastings, Mosteller 
Tukey, and Winsor (1947)), which can be roughly thought of as a ¢ distribution with 
5 degrees of freedom. Most of the values are close to 1.01, a value which suggests 
a 0.5% difference in critical values between the rectangular and the special. 


TABLE 3. VALUES OF (13) FOR AVAILABLE g AND h 


z 
h=size of g=number of observations trimmed from each end 
sample after 
trimming 1 2 3 4 

2 1.025 1.018 1.013 1.010 

3 1.016 1.012 1.009 

4 1.013 1.009 1.007 

5 1.011 1.008 

6 1.010 1.007 

U 1.010 

8 1.010 
ie e a SË re he 

340 


4. à 
P E 


LOCATION PROBLEMS BASED ON A SINGLE SAMPLE 
The remaining easy comparisons are made in Table 4, which offers no reason 
to change the conclusions already reached. 


TABLE 4. OTHER RATIOS OF VALUES OF divisor: (1+A4+1) 


distribution compared h=size of sample distribution divisors 
with the rectangular after trimming 
rectangular divisors 
Isoceles triangular 2 1.007 
3 1.014 
double exponential 2 1.041 
3 1.037 


NS oo Sh në mn jj i 


The “suggestion” of a somewhat larger divisor for longer-tailed distributions 
requires some consideration at this point. What we are finding is that the divisor, 


defined as 
ave (denominator)? 
— ———— ves (14) 
var numerator 


increases slightly as we pass from a very short-tailed distribution to a quite long-tailed 
one. In the case of Students t, this ratio does not change at all, but Gayen’s results 
suggest that, for modest numbers of degrees of freedom, the corresponding critical 
values change (decrease) by several times the fractions with which we are concerned. 
Since we would not expect this effect to be so prominent for trimmed and Winsorized 
seems likely that the matching behaviour of SSDjy,, as a denominator, 


statistics, it : 
g+h-+g) is concerned, is all that we can ask at this stage. 


at least so far as divisors ( 
11, APPROXIMATION TO THE DENOMINATOR—TRIMMED £ 
Depending upon the numerical behaviour of the divisor, we have a number 
of choices in putting the resulting procedure into approximate practice. If its behaviour 
is simple enough, we may calculate a “trimmed t' using a convenient approximation 
to the divisor, and then compare the results with, as an approximation, the (normal- 
theory) critical values of Student’s t, or, more precisely, with modified critical values 
appropriate to the precise distribution (say on normal theory) of “trimmed ¢’’. Whether 
or not the behaviour ‘of the divisor is simple, we can always choose a convenient: 
working divisor and then tabulate the appropriate critical values. Clearly the first 
of these possibilities is the more desirable. Does divisor, behave simply? 


Table 5 shows the ratio of the normal-theory values of divisor, to A(h—1), 
all values falling between 1.00 and 1.02. (For rectangular-theory values, see Section 
19.) Clearly we will do quite well to use h(i—1) as the working divisor, especially 
when we recall that using @ slightly undersized divisor corresponds to using slightly 
and is thus slightly conservative. 


longer-confidence intervals, 
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TABLE 5. RATIO OF NORMAL THEORY divisor, TO h(h—1) 


h=size of g—number of observations trimmed from each end 
sample after ta 
trimming 1 2 3 5 8 9 
2 1.009 1.007 1.005 1.003 1.002 1.001 
3 1.016 1.015 1.013 1.009 1.006 
, 
4 1.016 1.016 1.015 1.011 1.008 
5 1.015 1.016 1.015 1.012 
9 1.010 1.012 1.012 1.011 
10 1.010 1.011 1.011 1.010 
15 ” 1.007 1.008 
18 1.006 


— it 


The next question, of course, has to do with the approximate distribution 
of the result, particular in the vicinity of the conventional tail areas. To help us with 
this problem there are two pieces of information : 


(g1) Dixon and Tukey (1963) have studied the approximate distribution of 
Winsorized t, where yyy, (rather than Yr) is combined with spg. The results of this 
study indicate a quite Student’s-t-like distribution, with the best fit obtainable for a 
number of degrees of freedom typically somewhat less than h— 1(= the number cor- 
responding to a sample of size equal to the trimmed sample) 


(g2) R.A. Jensen has made some preliminary experimental sampling and 
Monte Carlo investigations into the critical values of ty. These suggest, rather than 


indicate, that these critical values may be rather close to those of Student’s ¢ for h—l 
degrees of freedom. 


Accordingly, if we wished to use trimmed t on an approximate basis (but see Sections 
14 to 16), we would calculate 


a Yr—M 

ae SO ret DE (15 
VSSDy_/h(4—=1) ae 

where M is a contemplated value for the cent 


er of the distributi led, and refer 
the result to Student’s t on h—1 degrees of freedom Ke ee ta 


thi 


5, And vre could, as a next step, proceed to a more precise determination of actual 
critical values for (15). 
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12. MATCHING TO THE WINSORIZED MEAN: A QUERY 


Turning now to the Winsorized mean, analogy with the results just described 
leads us to begin with denominators which pay more attention to the tails of the sample 
than does the numerator. Working with the values of the trimmed sample alone, 
an attempt to match drives us rapidly to the (g-times) inner range. 


Wg = Yn- Ys = Wig vee) (6) 
The corresponding divisor 
B 72 
divisor, (g-+h-+g) = eW) (an) 
var Yig 


does not appear to be so satisfactorily constant as we change the underlying distribu- 
tion, as Table 6 shows. 
Gaussian divisors (g-+h+g) 
‘ABLE 6. VALUES OF 8 
TABLE rectangular divisors (g+h+g) 
—m  —  —g SE EE ee 


h—size of 
sample after 
trimming 1 2 3 5 8 9 
2 1.009 1.007 1.005 1.003 1.002 1.001 
3 962 .985 .989 994 .997 
4 961 -963 -971 982 990 
5 950 945 953 968 
9 961 914 -900 922 
10 .970 914 903 -913 
15 1.022 
18 1.066 


, The most natural ways to seek improvement are (i) the use of observations 
not in yyy to help assess its variability, (ii) the use of a denominator in which differences 
among central observations are subtracted from Wr; in order to further emphasize 
the beliaviour of the ends of the trimmed sample. Both of these have strong heuristic 
disadvantages : the first because we may lose the advantage of getting wholly free of 
the observations excluded from the trimmed sample, with the consequence that, in 
very long-tailed distributions, we may improve our numerator rather more than we 
improve our estimate of its performance; the second because such subtractions, if 
effective, must tend to decrease the relative stability of the denominator (measured, 
if you will, in “effective degrees of freedom”). 

Whether either of these approaches, or some other, such as using 
Wr “HA ' [Yw Yr | “ (18) 
for a suitable value of A, will prove satisfactory, seems likely to be better investigated 
by working with actual distributions of ratios and the corresponding critical values 
rather than with ratios of moments. If this be so, it will be well to gain experience 
first with the distributional behaviour of tyy. 
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13. POWER OF TRIMMED tË 


In connection with a study of Monte Carlo methods adapted to statistical 
problems of sampling from non-normal distributions, Wonnacott (1963) has investi- 
gated some selected aspects of the power of trimmed t both when the underlying distri- 
bution is Gaussian and when it is a symmetrical Johnson (1949) distribution with low 
moments corresponding to a ¢-distribution with 6 or 4.7 degrees of freedom. In general 
his results are as would have been expected, though the comparison of Student's t 
and Wilcoxon-Walsh procedures is, surprisingly, somewhat unfavourable to the latter. 

A few highlights of Wonnacott’s (1963) comparison of powers for non-normal 
underlying distributions are these : 

(h1) Singly trimmed t for n = 10 is more powerful than Student’s t for the 
Johnson distribution with moments matching tę. 


(h2). Five-times trimmed t for n = 20 is very close to Student’s ¢ for the same’ 
underlying distribution. 


(h3) Three-times trimmed t for n = 10 is almost as powerful as the nearest 
Wilcoxon-Walsh procedure for the Johnson distribution with moments matching 
tie. 

In general, the prospects for the effectiveness of tp, seem very good. 


14. WHAT SHOULD WE EXPECT TO BE USED ? 


Once we have obtained all desired critical values of ty,, what are we to do 
in practice? Let us suppose the error rate (here = significance or diffidence level) 
at which we are going to work has been fixed, once for all. This assumption is of 
course unrealistic, but it enables us to look more clearly at the issues which concern 
us most immediately. We may as well also fix n = g+h-g. 


Let, then, 


ay = critical value of trg (19) 


and consider a man with a single sample of y’s. He has a wide variety of choices. 
He may take any of : 


point estimate interval estimate 


Yo Ye  IV/SSD/n(n—1) 
Yr. Yn+4V SSD ya (n— 2)n—3) 
F (20) 


Ya ay SSD ypo/(n—4)(n—5) 


Yrs Yrs+ts VSSD pyaj(n— 6)n— 7) 
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We know that, if his underlying (symmetrical) distribution behaves “‘reason- 
ably” (in quotes because we do not yet know what is “reasonable” and what is not), 
systematic use without exception of any single one of the interval estimates will offer 
him validity, those with larger g probably being more robust in this regard. But 
our motivation in entering upon this whole question was to improve efficiency while 


maintaining validity. And we are almost certain that the qualitative behaviour of 


relative efficiency will appear as in Figure 1. 


Relative efficiency —p 


=— Short toils Gaussian Long Tals —e 


Fig. 1. Anticipated qualitative behaviour of relative efficiency. 
Knowing this, it is most unlikely that the user will be content to use any 


thout exception. When his samples are long-tailed, thus appear- 
ing to come from long-tailed distributions, he will want to use values of g > 0. When 
they look “Gaussian” he will wish to use g = 0 (=Student’s t). (And it is conceivable 
that some will feel that they sometimes really do have an underlying distribution with 
shorter tails, and will wish to try to take advantage of this by occasional use of a 
the character of the mid-range. But this, for the reasons discussed 


ated as a second-order perturbation.) 


single value of g wi 


numerator more of 
in Section 5, is well tre 


15. Is THIS WISE ? 


At first glance it looks as though such a user has started upon a dangerous 
course, All those who have handled much data know, often largely instinctively, 
t would be to trust a single small sample to tell us with any precision 
about the shape of the distribution from which it came. And it tends to appear that 
this is what a man is doing when he picks ag in the light of the specific sample to which 
he is to apply the chosen procedure. 

Let us turn to a simpler, classical situation. What of the man with a sample 
3 from a distribution known to be Gaussian? He has an estimate of g2 
based on 2 degrees of freedom. If he believes that he “knows” g? from other evidence, 
and acts accordingly, he is often in bad trouble. If he makes a significance test or 
fidence interval as if o? were known, he will indeed be incurring much 


how dangerous 1 


of, say, size 


states a con 
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larger risks than he claims. It has been 55 years since Student (1908) shoqe po 
way out of this dilemma. We have only to plan to use an estimate, s?, of o? based or 
these observations, to admit that this choice will be fallible, and to ask how must we 
readjust the critical values to allow for this fallibility. Doing this for the boss of s 
for o?, which takes us to the distribution of Student’s t, is a familiar operation, now 
regarded as logically simple. The only possible difficulties are computational oneg 
associated with the calculation of appropriate critical values, once the rule for choosing 
the estimate of g? is fixed. (Thus, for example, the answers for Lord’s t (Lord, 1947, 


1950) where this is based upon range, are somewhat different from those for 
Student’s £.) 


The situation for allowance for long tails is a similar one. 
for the user to choose a given value of g—and decide what the form 
to the critical value will be—no logical difficulty rem 


values which will enable a man who behaves as specified, without exception, to make 
statements which will be valid ( 


for appropriate underlying distributions). The only 
difficulty is a purely computational one—find the amount of the needed change in 
critical values. 


Once we fix a way 
of the adjustment 
ains. There is a system of critical 


16. INDIVIDUALLY TRIMMED, t = TAILORED t 

The selection of the exact 
procedure is sensible, is a matter 
least two plausible alternatives, 


procedure for choosing a value of g, so long as this 
of little importance. At first glance there are at 


namely : 
Choose that g which minimizes SSD,y,/h(h—1) (21) 
and Choose that g which minimizes a,. /SSD,,,[h(h =). v2) 


However, it is only for distributions with the very longest tails that it will be sensible 
to choose values of g that discard almost all the sample values. (The cost of discarding 
so many sample values will not be paid in terms of the stability of Yr, Which cannot 


be made worse than the stability of the median—which is excellent in sampling from 
long-tailed distributions—but rath 


er in terms of the stability of the denominator— 

stability of the estimate of variability of yr,.) The simplest means of avoiding diffi- 

culty in this connection will be to introduce a fixed function G(n) and choose g accord- 
ing to: 

Choose that g < Ga) which minimizes SSD yp/h(h—1) i (28) 


or Choose that I< A 


n) which minimizes ag . /SSDy,/h(h—1). o (24) 
The choice between (23) and (24 


l ), for a given G(n), is not likely to have a subs- 
tantial effect on the adjustment required for the critical values. Nor is it likely 
to have any substantial effect upon the way in which relative efficiency depends on 
length of tail. The choice between (23) and (24) will almost surely be based upon 
considerations pertinent to the user, 
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There are three such which seem likely to be of major importance: 

(1) It is slightly easier to use (23), since the square-rootings and multipli- 
cations by a, required in the criterion of (24) are avoided. 

(j2) The use of (24) may be more palatable, since one ‘begins by seeking 
out the g for which the naive limits of (20) are closest together, and one retains this 
best-seeming value of g. 

(j3) The use of (24) makes it very easy to make a conservative correction 
to the confidence limits set by a man who has used the naive limits of (20), perhaps 
without revealing how he chose g. 

The strength of (j3) is of course greatest, when the change in critical values 
takes the form 


Gn, a` My ws (25) 
so that the resulting confidence limits are 
Yro Ldn, a ` Uy -\/SSDyp_/h(h—1) ss» (26) 


where g is chosen by (24). 
17. PURELY A MATTER OF SELEOTION 2 

One attitude toward the problems we have just been discussing would be 
that they are purely matters of allowance for selection of procedure, another routine 
instance of a broad problem which routinely faces us in the analysis of data: There 
is a single set of data, and several alternative procedure by which it could be analyzed. 
While it is possible to retain validity by pledging that one will always use a single 
procedure, it is clear that failure to do something to adapt the procedure to the data 
leads to a loss of effectiveness. But if we naively select the procedure that appears 
to work best in each specific instance, we are exposing ourselves to a certain loss of 
validity. 

There are almost always two ways out of such a bind, the one we propose to 
adopt, namely calculating adjusted critical values which allow for the ‘choice, by 
a prescribed rule, of the apparently most appropriate procedure, and one which is 
rather in the spirit of Robbins’ empirical Bayes techniques (Robbins, 1951, 1955; 
see also Neyman, 1962). In the second approach, one would regard an individual 
batch of data as one of a family of such batches, and plan to borrow information from 
the other batches to determine how we are to proceed with the first. (This general 
procedure is of course adopted daily, often at a very general level, by every working 
statistician whose familiarity with data of a certain class or classes help him to 
choose among procedures for its analysis.) 

If we wish to appear “whiter than driven snow” insofar as selection is con- 
cerned, we may decide to choose the procedure to be used upon a given batch of data, 
solely upon the evidence offered by the other batches of the family. If (i) the batches 
are independent, and (ii) their assignment to a “family” was without regard to their 
behaviour, even the most vehement and inquisitive seeker for selection bias could not 
ascribe any to the way in which the procedure was selected, 
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Why did we not propose such an approach to the selection of g? rae. 
we believe, because of a feeling about what will eventually be discovered to be pi 
case. Specifically, suppose that the shape of the underlying distribution is fixed so 
that we are applying some procedure, or mixture of procedures, to periples drawn 
from distributions with this fixed shape. On the basis of available evidence and 
insights it is reasonable to suppose that one can do better, even after making due 
allowance for the bn, g factor that will then be required, by using different values of 


g for samples of different apparent long-tailedness, than by using any one fixed 
value of g. 


For a specific shape of distribution, it is a factual question whether “mixing 
the g's” is advantageous or not. In time we should know the answer for certain speci- 


fic distributions.. In the meantime, we can do no better than follow our best 
judgement. 


One could, if one wished, repeat an analogous argument about Ga), which, 
in contrast to g, we have implicitly proposed to determine from a whole large family 
of batches of data. We have chosen to avoid making such an argument, and feel 


quite happy, on our present knowledge and insight, in making a sharp distinction 
between : 


(kl) Gn) to be picked on the evidence of a family of batches, and 


(k2) g to be picked, subject to g < G(n), on the evidence of the single batch 
in question. 


To what extent this distinction is made because of a deeper understanding of the role 
of g, and to what extent it reflects a real distinction between what the user can gain 


from individualized choice of g in comparison with the individualized choice of (7) 
is hard to say. 


18. REQUIRED NEXT STEPS 


What then are the next steps to be taken in making tailored t properly avail- 
able for use? Some of them, surely, are these : 


(11) Determine the values of a, for an adequate net of values of g and n, 
and suitable error rates. 


(12) Choose a reasonable function G(n) and determine the values of ban for 


a suitable pattern of values of g, n, and error rate. (A considerably sparser pattern 
may suffice for adequate interpolation.) 


(13) Investigate the power of the resulting procedure, both for a Gaussian 
underlying distribution and for longer-tailed underlying distributions. Comparison 
with both Student's ¢ and sign-configuration procedures will be in order. 

(14) Make a start on ex: 


ie tending the ideas and insights gained in the single- 
sample situation to more general situations, 

Once reasonable Progress has been made with (11) and (12), tailored ¢ will be 
fully useable. We would expect to recommend it for routine use at that time, 
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So far as one can see, all the steps just mentioned demand Monte Carlo tech- 
niques for their solution, although we would not exclude the possibility of an analytic 
attack on some of the simpler ones. It should be emphasized that by Monte Carlo 
we do not mean naive experimental sampling, where one merely draws samples from 
the postulated underlying distribution and calculates a trimmed-t value from each, 
thus building up an empirical distribution approaching that of trimmed ¢ like n=. 
Such naive procedures waste computational effort, and drastically reduce the accuracy 
of results that can be reached with plausible’ amounts of effort. Instead one ca 
plan to use as much as possible of one's knowledge and insight in designing a modified 
sampling scheme whose results estimate a number known to be the same as the number 
estimated by the naive procedure. (See Kahn, 1956 for general discussion, and Arnold, 
Bucher, Trotter, and Tukey, 1956, or Wonnacott, 1963 for specific examples.) 

While we plan to work on these problems at Princeton, we would welcome 


activity by others. 
19, ALGEBRA FOR THE RECTANGULAR CASE 


In closing we should set down the algebra which shows that 
rectangular divisor,(g+h+g) = h(h—1). (27) 
For a rectangular distribution with ends at 0 and 1 we have the well-known results 


for the low moments of the order statistics y, K Y2 KY S --- < Yn 


ave Yi = aa for all ¢ 


(28) 


ve Al) GENE 
or Ge da I 


whence ENË H eee 
- var (yj—Yi) = Oy wee (29) 


(j—i+1(j—i) Eo 


and ave (y; — 9)? zi (mIn?) 


which depends upon j—iandn alone as would be expected from the symmetric distri- 


bution of equivalent blocks. 


In view of the general identity 
221: Bly —w.)? = EB(u,—ua)” = KËT) 


we have, using the Winsorized observations as the u, 
> 


no nig n—g 
yt 2 —y.)2 
an SSD = 9+ 29 2 (Y — Y) + a a (¥j—¥:) 
+29E Yn- 4i) + 29 Yn- Ya)” +9, be (82) 


whence, setting t = gn PHOT: and using (30) 


h h 
onfa 2) avo SEDm = 49 È INN ËS neto... (88 
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Now E r(r—l) = (hk + 1)(h)(h—1)/3. And the double sum can be easily 
evaluated by considering a sample of size h, from the same rectangular distribution, 
and the corresponding untrimmed SSD, say SSD(z), for which 


2h - SSD(z) = DX(z,—z,)? s“ (34) 


ave SSD(z) = (h—1)o2 = = wa (5) 
Now write (33) for n = h and g = 0, finding 


2H(h+1)(h+2) ave SSD(2) = BE(r—s)(r—s-+1) (6) 
1 
but the left-hand side equals 
2h(h-+-1)(h+2) P = = (h—1) A(h++-1)(h-+2) e. (37) 
so that 
1 - 
g RD) MRI) 2) = 2Ah\(h-+-1)(h+-2) ave SSD(2) = XY(r—s)r—s--1). ... (88) 


PRIN) ave SSDyy = e) 


+29? (h—-1)h  ... (39) 
h—1)h 
and eve SSD iy — rra) HA2) 86h 1) +1298 
h—1)h 
= me Te $3) [3n?—(h—2)(2n-+-1)] vi (40) 
We turn now to h? var Yr = zx cov (Yi, Y;) as (41) 


and notice that, if r < s (n+1)*(n-+2) cov (Woar Yapa) 


= (9+7)(n+1—g—s) = (g-+r)(g--h+1—s) 
= F+g(r—8)-+9(h-+1)4(h+-1)(h-+2) cov (rs 25) 
= P—G~ |r el FRAIERII) cov (enz) (42) 


where the requirement that r < g can be lifted for the last form, so that, using (41) 


h2 1) = . $ 
(n 1)*(n-+-2) var yn, — h? + g?—g 2x |r—s8) +r(h+ 1)g-+(h+1)%(h+2)(h2 varzo) (43) 
and, since P+ varz, = h/12 and IZjr—s|= h(h?—1)/3 
we have Var yny = CERE) g4 12 gë 
121(n-F1)ë(n-F-2) 
— W8n—2h4.3)4 9 
INFE E NS 
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Now 
ave SSDy, _ h(3n?—2nh+4n—h+2) 
h(h—1) var yrg n(h(3n— 2h-- 3)--2) 
= (n—h) (h—2) 
t aBa A ay > 9 


which will usually be close to 1. 
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THE BEHRENS-FISHER TEST WHEN THE RANGE OF THE 
UNKNOWN VARIANCE RATIO IS RESTRICTED* 


By WILLIAM G. COCHRAN ` 
Harvard University 


itis reasonable to suppose that 


In some applications of the Behrens-Fisher test, 
The effect of this restriction on 


SUMMARY. : 
the unknown variance ratio cł/s} must exceed a known quantity 1/2». 
the significance levels of the test statistic d is examined by computing the probability that d exceeds the 


tabulated significance level, in the restricted region, for 6, 12, and 24 degrees of freedom in s} and s3 
and probability levels of 5% and 1%. If the F test made from the data gives strong support to the suppo- 
sition that atjat > Ms, the disturbance to the Behrens-Fisher significance levels is minor, but otherwise 


it can be substantial and can lie in either direction. The practical use of these results is discussed, 


1. INTRODUCTION 


In the Behrens-Fisher problem we are given a comparison x, normally dis- 


tributed with mean x and variance (o 21-03). We also have independent estimates 


sf of o? and s3 of cë, based on n, and ng degrees of freedom. The problem is to test the 


null hypothesis that ye has some stated value, usually zero. The test criterion is 

d = (e—p))V FFS. l 
Although the Behrens-Fisher test is intended to involve no assumptions 
asonable 


about the variance ratio 02/03, there are practical problems in which it T 
?/o3 must exceed some known value. To cite an example discussed 


to assume that 07/0? 
by Fisher (1941), the mean of a large sample of (,+-1) crude measurements of some 


physical quantity may be compared with the mean of a smaller number (na+1) of 


refined measurements, in order to examine whether the two processes are biased rela- 


tive to one another. If Oa, T, are the standard deviations of the populations of crude 
and refined measurements, respectively, the assumption o/c, > 1 appears justified, 
In applying the Behrens-Fisher test to this problem, x is the difference between the 


Sample means and 
2 2 
Sa a eee) 


i A Gp: 


Consequently, the restriction o7/o3 > 1 implies that ilog > (n41 m1). 


* This research was supported by the Contract Nonr 1866(37) with the Office of Naval Research 
U. S Navy Department. Tho computations were done at the Harvard Computing Center under eat 
NSF-GP-683 by the National Science Foundation. 
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This situation occurs also in certain comparisons in the analysis of split-plot 
or nested experiments. Factor A, with a levels, is applied to relatively large plots 
or experimental units in some standard design. Each unit is divided into b equal 
subunits, to which are applied the 5 levels of a second factor B. The mathematical 
model used in the analysis postulates that the error e;,, on the subunit receiving the ¢-th 
level of A, the j-th level of B and lying in the k-th replication has mean zero and 
variance 0”. The errors €,, eyg on subunits in different units are assumed 
independent, but errors ije» Cj’ ON two subunits in the same unit have a 


correlation p. It follows that the error variance of a unit total, when divided so as to 
express it on a subunit basis, is 


o3 = o*{1+(b—1)p}. e) 
However, the variance of the difference between two subunits in the same unit, also 
on a single subunit basis, is 


oj = o%(1—p). vey 0 (2) 


An unbiased estimate s? of o2, obtained from the analysis of variance of unit 
totals, is used for testing the main effects of A. The subunit analysis supplies an in- 
dependent estimate së of o? for tests involving main effects of B and AB interactions. 


If interactions are present and B is a qualitative factor, the experimenter may 


wish to test comparisons like (asb))—(a,b)): that is, the difference between the means 
for two A levels at the same level of B. In terms of the model, the error variance of 


such comparisons is 20*/r, where r is the number of replications. From equations 
(1) and (2) an unbiased estimate of this variance is 


2fsa-+(b—1 spt 
rb k 


If x is the mean difference between (a,b,) and (a,b,), the Behrens-Fisher test 
may be used if we take 


22 26 —1)së 
3? = Sa 2 — NISE axe (8 
I rb s2 rb (e) 
giving to s} the d.f. in s? and to 83 those in 87. 


In many split-plot or nested experiments there is reason to believe that p> 9. 
From equations (1) and 


Bok > 1/(0—1) (2) this implies that o2/o7> 1 and hence in (3) that 
CC a = —1). 


The objective of this paper is to make a preliminary exploration of the dis- 
turbance produced in the Behrens-Fisher test when the range of o?/c3 is restricted in 
this way. 


2. NUMERICAL EXAMPLE 

The following poo covers the type of problem illustrated by the two 
preceding examples. Let oz and gj be the variances in the two populations in which 
në are prepared to assume that ajo” > 1, Let f’ = s?/s? be the corresponding 
estimated variance ratio based on nı and ny degrees of freedom. The value of f’ will 


be known from the data. Further, let 


Shë of At a 
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where oj and o3 are the variances that enter into the Behrens-Fisher test, Aj and A, 
being known numbers that depend on the problem in question. Then the restriction 
may be written 


-ALÀ 
mec ee o6 . (63) 


For later calculations it is convenient to rewrite this as 


Oe il 
FF X 
N N. (6) 
where f = 82/8 =A; fFDa 

The effect of this additional information about ¢ = c'joë on the significance 
levels of d may be examined from a result due to Fisher, who showed that the unres- 
tricted significance levels of d can be computed in the following way. The probability 
that d exceeds a specified value is found first on the assumption that f = sis and 
@ = oj/o3 are both known. The average value of this probability over all possible 
values of ¢ from 0 to co is then calculated by assigning to ¢/f its fiducial distribution 
for known f, this distribution being the tabular F distribution with n, and n, d.f. 


To obtain the frequency distribution of d when f and ¢ are both given, Fisher 
notes that a/\/oz7-+03 follows the standard normal distribution and that 


ns? J Noss 
i  q 


follows that x? distribution with (n,-+-m.) d.f. Hence a variate that follows Student’s 
distribution with (n}+%ə) d.f. is 


ty nia 
“e I ms? nga ) 21.92 
at ar (rito?) 


g: 


—  dVsjtëë Vete 
Free 2 
J (të) (otto 


2 


oi 
Ni (mt uf) (1-8) 
By means of this relation, the probability that d exceeds any specified value, for given 


f and g, is read from the Student t-table with (n,-na) degrees of freedom, This 
probability is then averaged over the fiducial distribution of gj from 0 to co. 


(7) 


When it is known that gji > 1/f’, the natural modification of the Behrens- 
Fisher technique is to average this probability only over the values of olf that exceed 
1/f’. This is the method that will be, adopted in this paper. 
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From the preceding discussion, miting « = ¢/f, the probability that d exceeds 
the value dą when we confine ourselves to the restricted region u > 1/f’ is given by the 
expression 

2a e ts 
u” Pit] >d.g(u)}du I u°’ du 


P am (8 
din RË dh mF, dh 


(m+n) > | (nj- ngu) 


+ 
L [ mtn) OHP) 1 
E io) 


and Pijtj > dag(u)} is the two-tailed probability that Student’s £ with (nı +n) degrees 
of freedom exceeds d gu). 


where 


As an illustration, consider a split-plot experiment with n, = 6, ng = 12, 
b= 2. These parameters hold if the main units are arranged in a 4x4 latin square, 
each unit having two subunits. In the solid line in Figure 1 the probability that d 
exceeds 2.301 (the 5% Behrens-Fisher value for my = 6, ng = 12, f = 1) is plotted 


against the percentiles of the distribution of olf. The average 


probability (area under 
the line) is of course 0.05. 


Throughout most of the range of gjf the probability lies 
below 0.05, this being required to compensate for very high and low values of 
ọ that make the probability rise steeply towards 1 at both ends. Note that probabi- 


lities above 0.05 are mostly contributed by high values of gj. This happens when- 
CVE Ny > Nj. 


D 
3 
ko) 
N 
3 
a 
—I—t jt 
Oo Of 02 03 04 05 06 07 08 09 10 
PERCENTILES OF dyt 
Fig. 1. 


Probability of exceeding d.os for given f and 4/f (ni—6, na— 12). 


N Suppose that in the split-plot experiment the investigator finds s? = sj, i.e. 
i Ta y For a split-plot, f=f'l(b—1), so that since b = 2, f=f'=1. Hence the 
JI region over which the probability must be averaged is ¢/f> 1. This is 
approximately the region to the right of the median 0.5 on the abscissa. Eye inspec- 


tion suggests that the average probability in the restricted region will exceed 0.05. 
Numerical integration gives 0.063. 


The dotted line in Fi 


gure 1 shows thi Pres i ilitie DE 
e188, SE e corresponding probabilities for f = f 


% significance level of f’ it gives slightly greater support 
356 


P 


BEHRENS-FISHER TEST AND. RANGE OF VARIANCE RATIO 


to the idea that oëjo” > 1 than does our previous choice of ” = 1. For FA—.63 
the restricted region extends from 9. 25 to 1 on the abscissa. The average probability 
in this region is 0.057. As f’ increases further, the restricted region increases in size 
and the average probability moves toward 0.05. For f’ lower than 1 the average 
probability is higher than that for f’ = 1, being, for example, about 0.084 when 
f’ = 1/3. (In each case the probabilities are those of exceeding the 5% value of d 
for nj = 6, n = 12 and the appropriate f.) Similar results hold at the 1% level. 
The average probability in the curtailed region is 0.015 when f’ = 1 and 0.012 when 


f= 1.59. 
3. RANGE COVERED BY THE COMPUTATIONS 


In order to investigate these effects more systematically, a series of calcula- 
tions were made of the actual probability with which d exceeds dos and doj, the 
tabulated Behrens-Fisher significance levels, in the restricted region. The values 
chosen for x, and ng were 6, 12 and 24, giving nine combinations., The quantity f 
vras taken at its 75%, 50%, 25% and 5% levels. With f’ = sëjsj at its 5% level, the 
data are confirming the investigator’s idea that ogjoj > 1, while when f’ is at its 75% 
level, the data are tending to disagree with this apriori assumption. It was not thought 
necessary to investigate the situations in which f” is at its 1% or 0.1% levels, although 
such cases might be expected to occur commonly in practice when o/o, > 1. When 
Fis at these levels the restricted range is very close to the whole sample space of olf, 
so that the ordinary significance levels of d are unlikely to be much in error. 


Expression (8) shows that the probability depends on f as well as on nj, ng 
and f’. For the two-sample comparison in Section 1, 


Í = (M+1)f' (m+). 


Thus f/f’ is usually close to n/n, lying between is value and unity. In a split-plot 
experiment f/f’ is easily seen to be n/n if the man arranged in a completely 
randomized design. It is anj((a— 1)na when main units are in randomized blocks 
and an,/(a—2)n when main units form a latin square, where a is the. number of levels 


of factor A. In all three split-plot cases f/f’ lies between n/n and unity. 


For such applications it looks as if values of f/f’ lying between n/n, and nafn, 
In more complex situations, however, the ratio fif’ need 


are primarily of interest. 
Consequently the probabilities were computed 


not bear any simple relation to 7/72. 
for Fr = 0, 1/4, 1/2, 1, 2 and co. 
For any specific n, na f and f’ the or dinary Behrens-Fisher significance level 
d, was first computed on the IBM 7090 by interpolation in the Fisher-Yates tables. 
Expression (8) was then obtained by numerical integration, using the tri igonometric 
expansion of the integral of the t-distribution. I am greatly indebted to Michael 
Feuer who programmed and conducted the calculations. After debugging, computa- - 
tion of the 432 probability values took 17.9 minutes of machine time. 
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TABLE 1, PROBABILITY THAT d EXCEEDS THE BEHRENS-FISHER 5% LEVEL IN 


THE RESTRICTED REGION o3/cl > 1/f’ 


F f>0 f=f'|4 f=f'l2 =f’ ISH Ske 
nı =6, np =6 
75% -012 -023 .034 .050 070 122 
5096 -017 ~ 029 .039 .050 .062 082 
2590 .026 .037 043 -050 055 .063 
5% +040 046 ,048 050 ,051 052 
m=6, nj—12 
75% 024 042 055 .073 092 132 
50% 030 044 .053 064 -073 .087 
25% ,036 046 052 057 -060 064 
5% 045 049 050 .051 052 052 
njz—6, ng—24 
ee +035 .055 .069 087 105 .139 
5 % .039 .053 ,062 071 079 +090 
E % +043 052 -056 .060 .063 .065 
5% -048 -050 051 052 052 .053 
nj—12, na—6 
759, 
Kë x .013 .019 .030 .045 -091 
ae oe .021 ,027 .036 047 .070 
pë pe .030 -036 -042 .048 .058 
o +08 -044 .046 048, 050 052 
Ny=12, ng—12 
75% .019 
50% jae Sn -037 1050. .065 100 
pat X +041 .050 059 075 
% -033 041 045 së 
5% 044 047 7 -050 .054 .060 
+047 ,049 .050 i 052 
m=12, ng—24 
75% -030 042 -051 i 
509 035 5 +064 .078 .107 
50% +035 1045 .051 059 078 
25% ,041 047 051 va ae Ì si 
5% 047 049 050 i pa pr 
: -051 .052 052 
75% 004 tas at 
910 : -008 .012 
02 
5096 .010 016 reat vit .032 072 
25%, 020 .027 .032 pe 038 pra 
5% .038 042 044 Be +044 055 
t -047 .048 .051 
ny=24 = 
75% 014 ,021 ee i 
50% .022 -029 EH +037 .048 .078 
25% .031 -037 eek .041 049 065 
5% 043 .046 047 +045 .050 .057 
z -049 .050 .051 
$ ni=24, nj— 
75% 026 034 ‘oe 
50% 032 039 4 -050 -060 084 
25% 039 aes ee .050 .056 .068 
5% .046 .048 PSE -050 .053 .058 
: .050 ,051 ,052 
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TABLE 2. PROBABILITY THAT d EXCEEDS THE BEHRENS-FISHER 1% LEVEL IN 
THE RESTRICTED REGION o3/c? > L/f 


me ——— 


r fr0 Fa fare IF J= fs 
nj—6, m=6 
75% .0007 -0024 .0048 .0096 .0165 .0322 
50% .0014 .0037 .0062 -0100 œ .0137 -0186 
25% .0026 .0055 .0078 .0101 .0117 .0131 
5% .0059 -0084 -0094 -0100 -0103 -0105 
njzë, ng—l2 
759, .0030 .0073 0118 -0182 ,0251 0348 
50% ,0040 .0082 0112 -0147 -0172 -0194 
25% .0055 .0090 .0107 .0121 .0129 .0133 
5% 0080 .0097 ,0102 -0104 .0105 .0105 
ny=6, na—24 
7596 ,0055 .0118 +0170 0235 0294 0363 
50% 0065 0112 -0142 -0168 0185 -0197 
25% 0076 0107 .0120 +0128 -0131 -0133 
5% .0091 .0102 -0104 -0105 -0105 0105 
nj—12, ng—6 
75% 0002 .0007 -0014 +0036 -0078 -0234 
50% -0006 -0016 0028 -0053 0087 ' -0160 
25% .0018 .0033 .0050 .0072 0094 -0123 
5% 0053 ,0072 -0083 .0092 .0099 ,0104 
nj 12, na—12 - 
75% .0018 .0037 .0059 -0098 -0150 .0263 
50% .0028 .0051 0071 -0100 -0129 0172 
25% ,0045 .0067 .0083 -0100 .0113 .0127 
5% 0075 .0089 .0095 -0100 .0103 .0105 
nj—12, na—24 
759 0042 0075 0105 .0150 .0198 .0286 
50% “0054 .0083 .0104 .0129 .0151 ,0180 
25% .0069 .0090 -0102 0114 -0122 -0130 
5% ,0088 .0097 -0101 -0103 -0105 .0105 
ny =24, n=6 
759%, 0001 0002 -0006 0015 -0039 .0172 
50% -0003 .0008 .0015 .0031 .0058 .0135 
25% 0012 0023 .0035 -0054 .0077 0115 
5% 0048 .0064 .0075 -0086 0094 .0103 
ny=24, ng—12 
15%, .0010 .0021 .0033 .0057 .0093 0194 
50% 0020 0035 -0049 ,0071 .0096 0146 
25% 0038 0054 -0067 -0083 -0098 .0119 
5% .0072 -0083 -0089 -0096 -0100 .0103 
ny=24, ng—24 
759, .0031 ,0051 -0070 -0099 -0135 „0214 
50% .0045 0064 -0080 -0100 -0120 0155 
25%, 0062 .0078 +0088 -0100 -0110 0123 
5% 0085 -0093 -0097 .0100 -0102 0104 
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4. RESULTS 


_..—Tables 1 (5% level) and 2 (1% level) show the probability that d exceeds. 
the 5% or 1% level in the Behrens-Fisher table when we make the additional assump- 
tion that o7/o3 > 1. The results are not too simple to summarize and digest, but the 

“following points emerge. 


` (1) If one tries to guess the direction of the results intuitively, the easiest 
case is that in which f =f’, so that the restriction becomes oj/o3 > 1. It seems 
natural (at least to me) to guess that this restriction produces the same effect as that of 
an increase in s7/s3 on the ordinary Behrens-Fisher levels, because the additional infor- 
mation suggests that the stability of d will now depend more on the 'accuracy with 
which o? is estimated than it does in the unrestricted case. When n, < ng, an increase 
in s?/s3 raises the value of d required for 5% significance in the Behrens-Fisher tables 
for the range of values of Ny, Ng considered here. Consequently the restriction should 
produce probabilities greater than 0.05 or 0.01 in Tables 1 and 2 when f=f' and nj <ne. 
Similarly, the restricted probabilities should be less than 0.05 or 0.01 when f=f 
and n>n, These anticipations are verified in every case in Tables 1 and 2. 


When f= f’ and n, ='m,, the same ‘intuition suggests that the probabilities 
should not be disturbed. To the degree of accuracy shown in Tables 1 and 2 this hap- 
pens in all 12 cases at the 5% level and in all but four cases at the 1% level. The dis- 
crepancies in these four cases are so small that they may 
in the calculations. I have tried to prove by integration ti 
set of cases remains exactly at 5% or 1% 
f =f’ = 1 in which the result is obviou 


be due to rounding errors 
hat the probability for this 
but have not succeeded, except for the case 
s by symmetry, 


(2) As anticipated, when f’ is at the 5% level the probabilities remain close 
to those in the Behrens-Fisher tables except when f is very small or very largo, As 
J’ diminishes, the disturbance to the probabilities steadily increases, becoming very 
substantial when f” is at the 50% and 75% levels. 


(3) For given f’, n, and na the probability increases monotonically with f, 

as can be verified mathematically. 
: (4) Although the panel of values of nj and n, is not larg 
firm rules of interpolation against these values, it ap 
|», the probability moves towards 0.05 or 0.0l a 


e enough to suggest 
pears in all cases that for fixed 
S (21+) increases, 


APPLICATION TO DATA 
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limited. Itis clear that for the sample sizes considered, the disturbance to the Behrens- 
Fisher probabilities can be substantial and can lie in either direction, and no simple 
rationalization of the whole pattern of results has occurred to me. A more 
extensive table of the 5% and 1% significance levels of d in the restricted 
region is perhaps called for, though as a four-variable table it would be inconvenient 


to use. 


The result that the disturbance to the Behrens-Fisher values is minor (except 
for very small values of f relative to f’ ) when f’ is at or beyond the 5% level will often 
be all that the investigator needs to know. In looking for examples that might be 
typical of the split-plot case, I noticed that in the experiments reported in the well- 
known books by Snedecor, Federer, Bennett and Franklin, and Cochran and Cox, 
all the f’ values were beyond the 0.5% level (P. < 0.005) and three of the four were 


beyond the 0.1% level. 


The result that there is either no disturbance or at most a trifling disturbance 


when nj = ng and f= f’ is also useful, since this applies to the comparison of two 


samples of equal sizes (and to a split-plot with two subunit treatments and main units 


completely randomized). _ 


For the split-plot experiment with main units completely randomized, in ran- 
domized blocks, or in a latin square, we have, as noted previously, n, < n and f/f’ 
lying between 74/72 and unity. In all such cases in Tables 1 and 2 with n, < ng, 
the Behrens-Fisher significance level is too low for the restricted region, but usually 
by only a small amount. As an example, the experiment quoted in Goulden’s book 
(1952) has nj = 35, Na = 42, f= f' = 1.17. This lies at about the 32% level. For 
n, = 12, ng = 24 in Table 1, the probability for d.ç5 and'f = f is 0.054 when f” is at the 
25% level and 0.059 when f is at the 50% level. For nj = 21, ng = 42 both probabi- 
lities presumably move towards 0.05, with a further move in this direction when 

probability in the restricted region may be expected 


Ny = 35, na = 42. Hence the 
to be only slightly above 0.05 in Goulden’s example. 


In the comparison of two samples of unequal sizes, f/f’ = (my+1)/(n,+1), 
which will be approximately ngjaj in most practical situations. Two cases may be 
distinguished. If nı > na i.e. the less precise sample is larger, Tables 1 and 2 indi- 
cate that the probabilities are smaller than 0.05 or 0.01. For instance, in Table 1 with 
n, = 24, ng = 6, iE the 0.05 probabilities drop to 0.042, 0.027, 0.016 and 
0.008. ‘The disturbances can clearly be large if n, is much greater than n The 
probabilities imply, of course, that a smaller value of d is needed for 5% significance 
than that given in the Behrens-Fisher tables. If ny < ng, on the other hand, the 
probabilities are higher than the stipulated 0.05 and 0.01, as seen from n; = 6, m = 12 
Sif’ = 2, and from nj = 12, ng = 24, fif’ = 2. 
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In conclusion, it is hoped that by using Tables 1 and 2 as illustrated above, 
the investigator can find out the direction of the disturbance to the Behrens-Fisher 


significance levels and obtain some idea as to whether the disturbance is likely to 
be minor or major. 
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A TOLERANCE REGION FOR MULTIVARIATE NORMAL 
DISTRIBUTIONS 


By 8. JOHN 
Indian Statistical Institute 


SUMMARY. In this paper is developed a method of determining from a random sample from a 
p-variate normal population a region regarding which it can be asserted that, with probability B, a 
proportion not less than « of the individuals in the population are contained in it. Solutions to some 


related problems are given in the final section. 
1. INTRODUCTION 


In most statistical populations, whether they be populations of income or of 
blood pressure or of tensile strength of metal castings, a preponderant majority of 
individuals are concentrated over a relatively narrow range. This enables us, in many 
situations, to act as if individuals falling outside such intervals did not exist. Theo- 
retically, information concerning such regions is implicit in the probability distribution, 
though actual determination of them is often a matter of some mathematical difficulty, 
especially when the distribution is not unidimensional. The problem of tolerance 
regions is that of determining from only a random sample from the population, a 
region, regarding which we can assert that, with probability 4, a proportion not less 
than « of the individuals in the population are contained in it. 


The earliest formulation of the problem of tolerance regions is that of Wilks 
(1941). Wilks discovered a simple method of determining non-parametric tolerance 
regions for univariate populations. The corresponding multivariate problem was 
solved by Wald (1943). In Wald (1942) can be found an asymptotic solution of the 
problem of tolerance regions for parametric families of multivariate distributions, 
Tukey (1947, 1948), Tukey and Scheffé (1945), Fraser (1951, 1953), Fraser and Worm- 
leighton (1951), Murphy (1948) and Kemperman (See mepes later work on the problem 
of non- parametrii tolerance regions. 

Though a general (asymptotic) solution of the problem of tolerance regions 
for a parametric family of distributions was given by Wald (1942), specialisation of 
his solution to particular families of distributions does not generally lead to the best 
or the simplest solution possible. On the other hand, the non-parametric solution can 
be inefficient, as demonstrated by Wilks (1941), when applied to such special families. 
For these reasons, Wald and Wolfowitz (1946) worked out a separate solution for uni- 
variate normal distributions. Our purpose in this paper is to work out such a solu- 
tion for multivariate normal distributions. 
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2. NOTATION 


We shall denote by g,(, =) the density function of the multivariate normal 
distribution of dimension p, having x for mean vector and = for dispersion matrix. 


For any region R in the sample space, we shall set 
AR; u, BS T.S gp, 2) dede.. dep. ee) 
R 
The function f(z; A, m) is defined as follows : 


E e e LA mij-1 9-42 
f(z; A, m) erë = MTI” Hi gjë, (22) 


It is the density function of the non-central chi-square variable of noncentrality 
A and degree of freedom m. Also, we shall set 


F(z; A, m) = Í flu A, m) du. we (2.3) 


The equation F(P(@; A, m); A, m) = 0 (2.4) 
defines the function P(A; A, m). 


3. PROCEDURE FOR DETERMINING THE TOLERANCE REGION 


Let z be the arithmetic mean of N observations of a random vector a = (bias Lp) 
distributed according to the density function g,(w,Z). Let V be a realisation of an 
independent Wishart variable of n degrees of freedom having në for its expectation.* 
Denote by R, the region of all x-vectors satisfying the inequality 


(x—2) V= (w—2)' < k. 


(3.1) 
Let v = Pla; $ N3 p, p), (3.2) 
and % = P—Z; 0, n p). (3.3) 
Set e K = (v/0,)p (3.4) 
Regarding the region Ry, we can make the following assertion 
prob (v(Rg; pi, 3) > a) x 2. (3.5) 
The difference between the two members of (3.5) is small provided n and N 
are at least moderately large. N 


The constant v, can be determined from the table of 
of the chi-square distribution given. by Fisher and Yates (1953). If N is large 
P(a; Np, p) = Pla: 0, p). Hence when N is lar 

e ge, v, also ca 
these same tables. If (p/N) 1also can be determined from 


is not small enough, we have t È 
developed by Patnaik (1949) and Abdel- -Aty (1954). SË Ti a E thes a 


The sh 
would be of help in determining »,, Pence ae ARE EIE 


the percentage points 


* The matrix of corrected sum of products cali 
cul: i ; 
cur eagiixements: ‘ated from a random sample of size n+ 1 satisfies 
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d. PROOF OF EQUATION (3.5) 


Let M be any non-singular matrix. Let R, 3 be the region of all x-vectors 
satisfying the inequality j . 
(e—zM YA V MI (e—2My < k. Seo (CSD) 


Then, v(Ry; ft) ©) = (Rea y M, MË M). we (42) 


We now choose A so that A’ X A = J and 4’ Y A isa diagonal matrix. Let 
the diagonal elements of A’V A be (i = 1, 2, ..., p). We can assume, without loss 


of generality, that t Kta <...<t,. The region R}, 4 is then the region of all 


x-vectors satisfying the inequality 


È- (@;—a;)? Jti < k, 3 = (43) 
1 


where a; is the i-th component of the vector z A, 


Denote by R; the region of all x-vectors satisfying the inequality 


p 

AS ne (44) 
PR 
Seg k' = (vıfva)pt k" = (alo) È ti (4.5) 
and ki" = (v,/v9) pty. (48) 
Equation (4.2), together with (4.3) and (3.4), leads to the following : 
prob {viRy; #4, 1) > a} < prob (MË: Kë E) > a} < prob (u(Ri-: pA, I) > a}. 

(4.7) 


Since ty < (È ti)/p < tp We have also 
prob {y( Ry; NA: D>9< prob {Rye BA, I) > a} < prob {(Ry-; pA, I) > a}. 
(4.8) 


It is easy to demonstrate that 
prob (Bi : pA, I) > aj—proh {v(Ry ; HA, I) > a} 


tends to zero as N—00. Therefore, a fortiori, 


[prob (Ri: #4, 1) > a}—prob {v(Rx; pe, 2) > aj) 


tends to zero as N—00. Hence, equation (3.5) will be established if we show that 


prob {v(Ri; HA, D > a} = f. we (400) 
Now, (Ry; HA, I) = Fl: w, p), vi (4.10) 
w = (7—42). vi (411) 


where 
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Therefore, r - 
prob {v (Ry; pA, I) > a} 
: E prob {F(k"; w, p) > a}, 


= E,, prob {vwz™w > Plas w, p)|w}, where v = 


= ms 


t, 


= E prob {v > vwy Pla; w, p) jw} 
= 1—-E,F(vpz* Plas w, p); 0, np), 
= 1—8 Plow? Pa; 4N=p, p); 0, np) 
— Eglue—N “p)O194) F(vsv,' Plas A, p); 0, nP)-ax-'p 
— AB le EN p)O2JOA IF (vawr! Plas A, p); 0, mpYerowy» «+ (4.12) 
by Taylor’s theorem. Here y(w) is a function of w bounded by 4}N—~p and w. 


Because of (3.2) and (3.3), the second term of the last member of (4.12) is 
1—f. The random variable 2Nw has the chi-square distribution with p degrees of free- 
dom. From this it follows that the third term is zero. Finally, it is possible to prove 
that (021942) F(ooj' Plas A, p); 0, np) is bounded. Further, E(w—}N—p)? = 4p/N?. 
Therefore, the absolute value of the fourth term is less than B/N?, where B is some 


finite positive number. This proves that, if terms of order two in (1/N) can be neg- 
lected, then ` 


prob fv(Ri-: pA, 1) > ay = P. ni (4.18) 


5. ALTERNATIVE PROCEDURES 
Let Elti, ta ..., t) be any ‘average’ of br, të tees, E 


tp» Let v, be a number such 
that 
prob (ë(t,, ty, ..., tp) < va} = £. a (61) 
Set ke” = vlog. we (5.2) 
Me can then prove, by arguments exactly similar to those employed earlier, that 
prob {y(Ryn ; u, E) > a} = 2. we (5.8) 


The procedure discussed in Section 3 corresponds to the choice of the 
arithmetic mean for ë. This choice has the advantage that the exact value of vs 


can be determined quite easily using tables of the percentage points of the chi-square 
distribution. Some alternative choices for £ are considered below. 


(1) E (Et, mos ty) = (ty te, ..., t, IP, 
Hoel (1937) shows that the density function of E(t, 


toy nes ty) is approximately 
ciP(n—p41) Gpl(n—p-+ Ia e : (5.4) 
where c = ipll—Xp— 1)p— 2)jn). (6.5) 


TË p = 1 or 2, (5.4) is the exact density function of E 


ti tas ...,¢,). We can thus deter- 
mine v from tables of the chi-square distribution. 


(LI) E(t, ta, vey by) Si 2 pa h 


In this case, pë is distributed approximately as a chi-square with 
{np—pp+1)+ 2} degrees of freedom, 
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6. BOUNDS FOR prob {v(R,; pe, X) > a} 
If we neglect terms of order two in (1 IN), 
prob (ty > vejp) < prob {v(R,; 4, X) > a} < prob (tp > valp). So Eb) 
These inequalities are just another version of inequalities (4.7), obtained by the appli- 
cation of equation (4.10). 
We give below simple expressions for the distribution functions of i and 
t,» in the case p = 2. Pillai (1954) gives recurrence relations connecting distribution 
functions of different orders. 
If p = 2, starting from the joint distribution of ż and tą, which is given, for 
instance, by Fisher (1939), we can show that ; 
prob (t, > t) = [1—F(2t; 0, 2n)]J—[TG)/DGn)]t/2) -0 e- - [1 F(t; 0, n+1)] 
(6.2) 
and that, 
prob (t > t) = [1—F (2t; 0, 20)1--ID(JID GA) Kej2)0—V e- F(t, 0, n+1) 
(6.3) 
From (3.3), (6.2) and (6.3) we see that both extreme members of inequalities 
(6.1) are very nearly equal to 2 even for moderately large values of n. Therefore, the 
middle member is more so. 


7. RELATED PROBLEMS AND CONCLUDING REMARKS 


In some situations we face a slightly different problem. Here X = o2A where 
A is a known positive definite matrix and g? an unknown positive number. An un— 
biased estimate s? of o?, independent of the estimate z of yz, is available. The quantity 
ns” is a realisation of a chi-square variable with n degrees of freedom. A situation of 
this kind arises, for example, if we want to determine a tolerance region for the distri- 
bution of estimates of regression parameters in a linear model. The procedure of 
Section 3 applies to this case also if we set 
nja =, ee doke 
and va = Pia: 0, n). ve (7.2) 
A problem closely related to that which we have been discussing in Sections 
1 to 6 is that of determining from a random sample from the population a (random) 
region R regarding which we can make the following assertion : WE eN 
HYR: 4, X) = a. CA) 
The region R, of Section 3 will satisfy this requirement x we choose k so that 
kN(n—p+1)/[p(N+1)] is the upper 100(1—g) percent point oi the F-distribution 
with p and n—p+1 degrees of freedom. në and Guttman (1956) prove that among 
regions satisfying condition (7.3), R, is, in many respects, best. 
In the practical application of the De editë of Section 3, it would be con- 
venient to have at hand a table of values of K for various values of N, n and P. Such 
a table we hope to make available at a later date.” 


SI * Tables required in the univariate case are given by Weissberg and Beatty (1960). 
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ON TABLES OF RANDOM NUMBERS 


By A. N. KOLMOGOROV 
Academy of Sciences, USSR 


1. INTRODUCTION 


The set theoretic axioms of the calculus of probability, in formulating which 
Thad the opportunity of playing some part (Kolmogorov, 1950), had solved the majority 
of formal difficulties in the construction of a mathematical apparatus which is useful 
for a very large number of applications of probabilistic methods, so successfully that 
the problem of finding the basis of real applications of the results of the mathematical 
theory of probability became rather secondary to many investigators. 


T have already expressed the view [see Kolmogorov (1950), Chapter I] that 
the basis for the applicability of the results of the mathematical theory of probabi- 
lity to real ‘random phenomena’ must depend on some ferm of the frequency concept 
of probability, the unavoidable nature of which has been established by von Mises in 
a spirited manner. However, for a long time I had the following views. a 

(1) The frequency concept based on the notion of Limiting frequency as the 
number of trials increases to infinity, does not contribute anything to substantiate 
the applicability of the results of probability theory to real practical problems where 
we have always to deal with a finite number of ae. 

(2) The frequency concept applied to a large but finite number of trials does 
not admit a rigorous formal exposition within the framework of pure mathematics, 

Accordingly I have sometimes put forward the frequency concept which 
involves the conscious use of certain not rigorously formal ideas about ‘practical 
reliability’, ‘approximate stability of the frequency in a long series of trials’, without 
the precise definition of the series which are ‘sufficiently large’ ete. [see Foundations 
of the Theory of Probability, Chapter I and for, more details Great Soviet Encyclo- 
paedia (section on Probability) and Mathematika iou metod i Znachenye (Chapter on 


Probability Theory). 
T still maintain the first of the two theses mentioned above. As regards the 


second, however, I have come to realise that the por ceRE of random distribution of a 
property in a large finite population can have a strict formal mathematical exposition. 
In fact, we can show that in sufficiently large populations the distribution of the pro- 
perty may be such that the frequency of its occurrence will be almost the same for 
all sufficiently large sub-populations, when the law of choosing these cd sufficiently simple. 
Such a conception in its full development Ae ra the E OT of a measure of 
the complexity of the algorithm. I propose to discuss this question in another article, 
In tho present article, however, I shall use the fact that there cannot be a very large 


number of simple algorithms. 
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For definiteness we shall consider the table 
(Et, 3 ty) 
of N zeros and ones : & =0orl. 
Such a table will be called random, if, while choosing the subset A of sufficiently large 


size from 1, N by different methods there is a stability in the frequency 
nëj— së 
N ked 
of appearance of ones in A. One can, for example, choose A as 


(a) the set of first n even integers 2, 4, 6,..., 2n 


(b) the set of first n prime numbers Pj, Ps, ...; Pa and so on. 

The ordinary notion of ‘randomness’ of a table 7' does not consist merely of 
the stability of the frequencies while choosing A by methods entirely independent of 
the composition of the table T. One can for example, choose the set A as 

(c) the set of first n values k > 2 for which ba = 0; 

(d) the set of first n values k > s for which 

tjet = Gy, tr-a = a, ..., eg = My, 
(e) the set of the first n even numbers k = 2i for which 
=l, 
(£) the set of numbers ky, ka, ..., k, ... chosen according to the law 
PS 


kisi = kët lH pi 
and so on. 


The precise formulation of the con 


cept of ‘admissible algorithm’ of choosing 
the set A will be given in Section 2. 


If while using a table of sufficiently large size N at least one single test of 
randomness of this type with sufficiently large size of the sample n leads to a “signi- 


ficant’ departure from the principle of frequency stability then we immediately reject 
the hypothesis of ‘pure random’ origin 


of the given table. 
2. ADMISSIBLE ALGORITHMS OF SELECTION 


AND (n, €)-RANDOM TABLES 
An admissible algorithm of choosing the set 


A-ROCIN 
according to the table 7’ of size N is defined by the functions* 
Fo, Go, Hy 
FAYE, T)), GE, Tı), Hy E Ti) 
FAG, Twi. ča Ta), GJE, Ti: ba Ta), HË, Ti) Éa To) 


Py (Oy Ti Éa Tej skes N-i Ti); Gy E, TË Tat 


vë Oya, Tya) 
Hy Sis Tajms Ëy—1) Ty) — 
* The functions in the first line are constants (functions on the empty set of arguments). 
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where the arguments 7, and the functions G, and H, take values 0 or 1 and the argu- 
ments & and functions F, take values from 1, N. The functions F, are subject to 
an additional condition 
Pry Eq, Ti -e3 Gen Te) TË Es. gave (21) 
Defining an algorithm is equivalent to forming the sequence 
gja Fo 
ta = Fy(X,, tay; To tro), 


ta = Fa, tay; Xo, ta»), 


v= Fil, toii no 3 Kg, besë) nr 2) 
and determining those elements of the sequence which are found in A. The sequence 
terminates as soon as the value* 

Hi (Cis AH te) = 1 (23) 
appears. In this case the sequence terminates with the element x,. If when k < N 
we have all the time 

Hilti, ter; -+-3 Up, tex) = 0, 
the sequence is terminated by the element x, with s = N, i.e., by exhausting all tho 
elements of the set 1, N : in view of the condition (2.1) all the elements of the sequence 
(2.2) are distinct. 

The set A is formed from those a, for which 
Gra (y, tar; see) Vea, teka) = 1 ° wn (214) 

Tt seems to me that the given construction correctly reflects the basic concept of von 
Mises in its complete generality, preserving, however, the basic limitation that for deter- 
mining whether xe 1, Ñ falls in the set A the value of £, is not used. 


Now let the system 
Wy = {B} 


of admissible algorithms of selection (the size’ N of the table being fixed) be given. 
Definition : ‘The table T of size N is called (n, ¢)-random with respect to the 
system 72y, if there exists a constant p, 0 <p <1, such that for any 
A = R(T), Re Fey 


with the number of elements Ven, 
1 
== DH 
the frequency n(4) sh. 
In(4)—pl <€ 


satisfies the inequality 


E lection cannot begin and the set A i 
* In particular, if Ho = 1 then the selectio' g se is found to be empty, 
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‘Sometimes, it is convenient to say (n, £, p)-randomness, assuming that the 
constant p is fixed. Then the following theorem holds. 


Theorem 1: If the number of elements of the system 72y does not exceed 
t(n, e) = aa sex, (25) 
then for any p, 0 < p < 1, there exists a table T of size N that is (n, £, p)-random with 
respect to Jey. 


The interpretation of the estimate, contained in the theorem, is made more 
transparent, if we introduce the binary logarithm 


A(Fex) = logs PIJEN) 
of the number of elements p of the system 72y . A(¥2y) is equal to the quantity of infor- 
mation, which is necessary for choosing an individual element R from z@y. It is clear 
that in the case of large A(72y) the system 72y must contain algorithms, the very deter- 


mination (and not merely the actual realisation) of which is complicated (requires 
for its formulation not less than A(7ey) binary symbols). 


In our theorem the condition of existence of tables which are (n, ¢)-random with 
respect to 72y with arbitrary p is written in the form of the inequality 
A(Fex) < 2 log, ene(1—e)—1. wee (2.6) 
Such a qualitative formulation of the result contained in the theorem is instruc- 
tive by itself. If the ratio A/n is sufficiently small then for any previously given £ 
and any N and p there exist tables which are (n, ¢)-random with respect to any system 
of admissible algorithms with 
: Fey) <A. 
The proof of this theorem will be given in Section 3. In Section 4, we shall 
examine the possibility of improving the estimates 


contained in the theorem. 
Now we make. two supplementary remarks. ' 


Remark 1: Since the algorithm of choosing the set A — R(T) is determined 
by the functions F,, Hy, Gz it is natural to consider two algorithms to be same when 
and only when their corresponding functions F}, H,, Gj coincide, Already from this 
point of view the number of distinct possible algorithms of selection for a given N 
is finite. 


It is possible to hold on to a different point of view and consider two algorithms 
of selection to be different only in the case when they give different sets A = R(Z’) 


at least for one table T. From such a point of view the number of distinct algorithms 
is further reduced. But in any case it is not greater than 


(20E — ana 

The question of precise estimation of the number of admissible algorithms 

under the secónd approach is not so simple. The problem is very simple only for algo- 

rithms, by which the set A is formed independently of the pro wile of the table T. 

Distinct number of such algorithms is equal to 2¥ according to a mbet of different 
sets Ae 1, N. 
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Remark 2: The admissible algorithms of selection from the set of all pos- 
sible natural numbers was considered by Church (1940). Now, in our definition, 
instead of the finite table T' we consider an infinite sequence of zeroes and ones, 

tis tances , 

We assume that the values of the arguments č} and the function F, are arbitrary natural 
numbers. But we reject the requirement that the selection must stop at s = N 
and instead assume that any (now infinite) table of functions Fy, Gr, H, is ‘computable’ 
in the sense sufficiently well-known in all the numerous propositions for such formal 
definitions. Under these considerations we obtain the inessential generalisation of 
Church’s concept. The basis of Church’s result is the existence, for any p, of sequences 
tu ty, ..., ty ...the density of which is equal to p in any infinite! set A obtained by an 
admissible algorithm. 


3. PROOF OF THEOREM 2 


This result belongs to the Theory of Finite Algorithms and its formulation 
does not contain any concept borrowed from Probability Theory. If, in proving this, 
we make use of certain results of Probability Theory then this proof will have a formal 
character as it would only include a certain distribution of ‘weights’ in the set of tables 
T of size N, the weight 

P(L) = p(1—p)*-™ 
being assigned to the table containing M ones. This method of proof does not affect 
the logical nature of the theorem itself, and does not hinder its use in the discussions 
needed for defining the domain of applicability of Probability Theory. 

In another paper we shall prove the following inequality relating to the “Ber- 


noulli Scheme’ : 


kzn 


te ( sup (lap (SE ) te AS ore, (Cd) 


Here p is the probability of success in each of a sequence of independent trials; y 
is the number of successes in the first k trials. We can easily derive the following 


corollary from (3.1). 


Corollary: Let 
P(e = 1| k < v, Ey pels Ër—1) =y 
where Eq, & E, is a sequence of a random number of random quantities and p is a 
Ler a> Sar rete w 
constant. Then 
—2ne2(1—e). 


mp|>e) <26 (3.2) 


P(y>™ 


1 In this concept of Church substantial interest lies only in algorithms which extend infinitely 
n vi . G Ci 
That is wl 8 in this case, the functions Gr and all that is connected with these functions must be omitted 
hat is why, , ted, 


2 We are concerned here with the conditional probability that £ = 1 when k = y and Ej, ¢ 
59 


fk are given. 
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We shall now examine the system 72y of admissible algorithms, p in number. 


Sy k ti 
We consider a table formed randomly with probability p for t,— 1 independently 
of the values taken by the other tm. If we fix Rey2y and denote by 


Eas Bay en Ev 
those elements of the sequence 


Ti, To, ane, T 


which fall in A = R(T) (numbering them as they appear in the course of the e 
it can easily be seen that the conditions under which (3.2) is valid are fulfilled. e 
the probability that, for any given Re 72y, the number of elements v of the set A wi 


not be less than n and the inequality |7(A)—p]| > e will also be satisfied, will be less 
than 2 g7 271-8) 


E p< Sonata 


then the sum of the probabilities of failure of the inequality 

|n(A)—p| < £ 
for those algorithms which lead to the sets with not less than n elements will be aa 
than unity. Hence with positive probability the table 7 will be found to Be (n, £, z 
—random in the sense of the definition of Section 2. Hence follows the existence 0 


tables which are (n, e, p)-random with respect to 7@y (indeed independently of the 
probabilistic assumptions on the distribution of P(T) in the space of tables). 


4. ON THE POSSIBILITIES OF IMPROVING THE ESTIMATE BY THE 


THEOREM OF SECTION 2 


Tf we fix n, e, N, p, then, for an integral non-negative p one of the two situations 
is possible: . 


(a) whatever be the system #y of p admissible algorithms of selection, there 


exists a table T of size N which is (n, £, p)-random with respect to Fe: 


(b) there exists a system 72y of p admissible 


algorithms of selection relative 
to which there are no (n, £, p) 


-random tables T of size N. 
We can easily find that the existence of the situation 

the existence of the same situation for p' < p. 

(a) will always be true. 


(a) for some p follows from 
It is clear that for p = 0 the situation 

n upper bound 

T(n, £, N, p) = sup p 


Hence, there exists a 


pea 
of those p for which the case (a) holds. For all P greater than 7 (n, £, N, p) the case 
(b) holds. 
If we put 


T (n, £) = inf t(n, £, N, p) 
PN 
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then the substance of the theorem of Section 2 can be expressed in the form of the 
inequality: ; i 
1 gn (1—e), 


t(n, €) >> (4.1) 
Now taking logarithms 
Un, e, N, p) = loga T(n, £, N, p), Un, £) = log, T(n, £), . 
we can write (2.6) in the form 
Un, £) > 2ne*(1—e)—1. sex! (2) 


In fact, the main interest lies in the asymptotically precise estimation of l(n, e) when 
£ is small and n and U(n, £) are large. When 


€0, n €?-00 


we get from (4.2) Un, £) > 2ne?+o(ne?). nas) (43) 
We shall find later, on the other hand, that when 

£—> 0, ne—> 00 
the relation Un, £) < 4ne+-o(ne) we (4.4) 
will hold. Unfortunately, I cannot remove the discrepancy between the power of 


£ in (4.3) and (4.4). 
The estimate (4.4) is a simple consequence of the following theorem the formula- 
tion of which is unfortunately somewhat complex and will become clear through 


the method of proof chosen by us. 
—2) 
Theorem 2: If k< = , n K(k—l)m, N > km then 
bs a) eee ve (45) 


For proving the theorem it is enough to construct, under the condition 


k< = , n = (k—1)m, - N = km, 


a system 72y out of 

p= k.2”--1 
admissible algorithms, for which there does not exist an (n, £, 4)-random table 7, 
tition 1, N into k sets Aj, i = 1, ... k, with m elements in each. Every 


We par 
We form the set 


A; contains 2” subsets. 
Aio ShA ki 8=1,2,.., 2" 
by taking the union of all Aj, j # i and the p-th subset of A,. We form the system 


Fey from 
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(a) k.2” algorithms R;, for selecting the sets A, ; 


(b) one algorithm R for selecting A —IN. 
We prove that there does not exist a table 7 which is (n, £, $)-random with respect 
to Ry. 

Let us take an arbitrary table T and assume that it is (n, ¢,})-random with 
respect to Ry. Then it must contain at least (}—e)N zeroes and (}—¢)N ones. Hence 
we can findi and j such that A; contains « > (4—¢)m zeroes and A; contains 


B > (4—€)m ones. 
Let y = min (q, f) > G—e)m. 


There exists an algorithm R'e Ry (R"e Ry) for selecting the set A’ (A”) which 


consists of the entire 1, N except y elements in AA;) which correspond to zero (one) 
in the table T. It is easy to see that the corresponding frequencies are equal to 


n(A!) = yoy 4" -Y— i 


where M is the total number of ones in the table T. Let us estimate the difference 
between these frequencies : 


ve N G—e)m 
XAN—NA) E es ee 


This estimate contradicts the set of inequalities 


|n(A’)—3| <&,|m(A")—3| Ke, 


which follow from the hypothesis of (n, e, 4)-randomness of the table T. This contra- 
diction proves the theorem. 
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REMARKS ON THE BEHRENS-FISHER PROBLEM 


By Yu. V. LINNIK 
Mathematical Institute, Leningrad 


1 
The Behrens-Fisher problem is well known in the statistical literature (see 
for references J. Neyman’s report at the Amsterdam Mathematical Congress, 1954). 
We are given two independent samples 


Bys very Cn E W(Ay, 01); Yi ---> Yn E N(da, Oa) 


consisting of independent identically distributed observations belonging to two normal 
populations with the parameters a,, O1; Qz, Ca. The general Behrens-Fisher problem 
consists in constructing the tests for the hypothesis Ho : a,=a, when nothing is known 
about the ratio. 

We consider the tests based upon the similar zones excluding the nuisance 
parameters o, and a». They will be determined by the statistics ¢ which we shall 
call similar statistics. These are the statistics ¢ measurable with respect to all the 
probability measures induced by N(a,, 7) and Nas, Ca); for a similar statistic we 
have by definition : p(t < E) for any E does not ‘depend upon c, and oy on the hypo- 
thesis Hy : a, = ap. 

Tt is well known that there are non-trivial similar statistics (Romanovsky- 
Bartlett-Schoffë tests) (see Barankin, 1950). The system of sufficient statistics is formed 
by të, j, SË, 53. It is well known that the Neyman structures (see Lehmann, 1959) in 
the case of this particular problem produce similar statistics ¢ indeed, but these are 
useless because p(t < č) does not depend upon all the parameters of the problem 
and so cannot discern H, from the alternatives. 

Ifo, = 0, we obtain the classical problem of Student. Student’s similar 


statistic excluding 7, = 79, is well known as : 
hi w A 
Wo remark that this statistic depends explicitly only upon #—ğ, si, 83 and the ratio of 


the sample sizes =, For Student’s problem the system of sufficient and necessary 


statistics is E t+ 2y, E a4 È ss} 

and ty is independent of them thus forming a Neyman structure. One of the aspects 
of the Behrens-Fisher problem consists in the investigation of the similar statistics 
which are dependent only upon the sufficient statistics Z, 7, s4, Sẹ (and so cannot 
bo nontrivial Neyman structures). They are usually considered to be moro all loss 
formally similar to the Student statistic t (sce Wald, 1955). 
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We shall consider here the similar statistics for the Behrens-Fisher problem 
which are in a certain respect similar to the Student statistic t,. Namely, we shall 
consider the statistic ¢ with the following properties. 


(1) t= e{ z—g, $, $, z) 


which, depends only upon the arguments z—g, s?, s2 


23 


m z : 
Së The sample sizes enter into 


t only implicitly and in the form of the ratio =. 


Denote z—j = X; $ =u; =v. 
We require that 


Let E — 0 be any positive number. 


(LI) t (Ex, Eu, Ev, u) =t (z, U, V, z) 
; n n 
(homogeneity property). 
In view of the property (II), we have for any v>Q: 


(uv, m= Fe FZ) = o( um 


We introduce the property 


X um ND ëse 
ŒI) g ( Je) for the fixed 5, Which is a continuous function in both 


variables 


X u 
p= Ve b= for —co K E K 00: OK E <0, 


The Student's statistic t (c.f. (1. 1)) will obviously satisfy the requirements (I), (LI) 
and (III). pe 


2 


We formulate now the two theorems which are to be proved in this paper 
Theorem 1: A non-trivial similar statistic z for the Behrens-Fisher aera 
cannot be continuous. It must have the discontinui i 
p uity points at least at each point of the 
Sasi. = Sa SN... Yn- 
Theorem 2: Any similar statistic t — t ( X um 
— —,— Sh, O; , à 7. 
Pë OP ) for the Behrens-Fisher 


problem, satisfying the requirements (I), (IT) and (TIT) 


and t (0, w 


s must be constant for X—0: 
uh does not depend w g 
v’n p gg v' 


Din i 4 e 


; Ge 


REMARKS ON THE BEHRENS-FISHER PROBLEM 
3 


We now pass to the proof of Theorem 1. Let z be a non-trivial similar statistic 
for the Behrens-Fisher problem. Put a, = a, =a. Consider a as a given and fixed 
number, not a parameter. The likelihood function will be obviously: 


Llers ey Em Ya oos Ya) = QA gem gan, 


t m i | n 
exp ( za 2 (—apP— QE: (ya) ). 


40i i=1 “09 i=1 
m = n z 
Now a being a fixed number we see that © (ay—a)2 and X (Y:—a)? are 
il j=1 


1 1 5 h RE 
sufficient statistics for the parameters z and = It is obvious that the conditions 
i 3 


of the Lehmann-Scheffé theorem on complete systems of sufficient Statistics are 
satisfied (ef. Lehmann (1954), p. 132, Theorem 1). Hence z must be a Neyman 


m n 
structure and thus independent of ( = (aj—a), X (—a)”). Suppose, 2(a, . 
á isl jal 
m Yrs +++) Yn) to be continuous at the point (a, a, ..., a), but non-trivial, Hence, there 
exist ô > 0 and £ > 0 such that 


“sj 


DUA nes Em3 Yay “5 YN)—zla, a, -n @)| > >e. R E 
But z must be independent of the pair 


( 2 (za), $ Gja). 


m n 
If wo put Z (v;a) = s; È (y,—a)? = s, 


i=1 ja 
then for a sufficiently small £p Jaj—al and lyy—alj will be sufficiently small 
(i= 1, 2,...,m; j = 1,2,...,n). But then we shall have : 


KA (oo) w (8.2) 
which contradicts (3.1). 
As this is true for any point a, =... =a, = J = ...= Yn = @ our assertion 
is proved. 
4 
I E E, 2) 


be a similar statistic for the Behrens-Fisher problem satisfying the requirements (1), 
(II) and (III); we have : 


Plto < E) “E (4.1) 
which does not depend upon c, and oy if a, = ay. . 
E TE E 
Denote of = 9,3 08 = Hs =, 
x = AX: a Uy 3 v = 0. 
and put atA ~ 1? 8, a ge 
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, Uo a es 
Then X, eN (0, = is a normal variable, while u, = tË NT RË are X 


type variables; X}, 1, vı, are stochastically independent. Hence, we have : 


? 


m X 14 m hare ði— A, 

=i 1+v0)}, ou where 
i(Gu E) (rt v =a) 0, 
which must have the distribution independent of 0, for any 0 > 0. The same property 
must hold for any continuous. function W(t). Hence it follows easily that the 
Lebesgue sets: £e  (Ë KA) for any A must have the measure induced by X, 


M1," 
%, v, independent upon 0 (this is also sufficient for t to be.a similar statistic). Let 


now No, the ratio n? remaining 'constant. ‘Then the probability measure induced 


by a “obviously becomes concentrated around the point : 
v v 
pee x - 03) y= 
VË v 


Now let the value of ¢ in the point č = 0, n— 1 for@—1 be : :(o, LT, E = ty 


say. For a fixed arbitrarily small £, consider thelLebesgne set: 


e(j—E KË te). 
For n00 the measure of this set, induced by X. 
for a given value of 0 > 0, we have : 


1(0, 0, LZ) =y At 


As tis continuous, for sufficiently small £, we shall have : 


v W, V, will converge to 1. Now, 


p| h—e <#(0, 0, La) Shte)-o 


But this probability must not depend upon the value of 0. He 


for n> co, 


nee ty = t,, and 
t (0, 0,1, A must be a constant. 
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A NOTE ON DETERMINATION OF SAMPLE SIZE 
By M. N. MURTHY 


Indian Statistical Institute 


SUMMARY. In this note a procedure of determining the sample size is proposed, where 
the idea is to fix the sample size in such a way that the probability (P) of the length (L) of the confidence 
interval (associated with a specified confidence coefficient) for the parameter (4) being less than a given 
value (k#) is a pre-specified quantity. 


Let y be normally distributed with mean x and standard deviation ø. Suppose 
a sample of N units is drawn with equal probability. Then the mean g based on the 
N observations is normally distributed with mean £ and standard error ol/VN. Let 
s? be an unbiased estimator of 0” based on a sub-sample of n units (or on n random 


sub-groups of = units), 


sË wo 
It is well known that the statistic 
J—k 
saj SIVN 


is distributed as Student’s ¢ with (n—1) degrees of freedom. ' 

Using the tabulated values of the ¢-distribution, we can set up confidence 
interval for w at any specified level of confidence (l—a). That is, if t, is the any 
point of t, then 


= | Issa! St | SS da US 
The length L of the confidence interval is given by 
L= 2s)VN.: (2) 
Suppose the sample size is to be so fixed that 
P(L < ku) = 1—p, ve (3) 


whore k is a pre-specified quantity and (1—f) may be taken as the second level of 
confidence, the first level of confidence being (1—æ) in (1). It may be noted that 
P(L < ky) is a function of the sample size and increases with increase in sample size. 


For finding the sample size which could satisfy both the levels of confidence 
given in (1) and (3), we may proceed as follows. 


PUL < ku) = 1—2, 
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that is, P(2t,8/VN < ku) = 1-8, 
4 (n—1)s? _ k? N(n—1) e 4 
that is, | zr za @ ae | = 1—p, (4) 


where cis the population coefficient of variation (c/u) and (n—1)s?/o? is a y? with (n—1) 


degrees of freedom. Reducing (4) to an incomplete T-funetion which is already tabu- 
lated, we get 


P(L < ku) = I(u, p) sa : (0) 


u (Pr) 
where I(u, p) = ï GF J dr, 
0 


Gj — të NY(n—1) 
P = (n—1)/2, and u = = are s 


For given values of (1—4), (1—2), c, n, 
such that 


and k we can first get the value of u 
Ilu, p) = 1—2 
and then get the required sample size 
2 52 
Nout AVZR - (6) 
k Vol) të 
The usual procedure of determining the sample size consisted in finding the 
walog of N such that E(L) is equal to a specified value ku. The proposed procedure 
given in this note is a generalization of the usual procedure in the sense that it 
ensures a pre-specified value for the probability that Z is less than ky I 


Paper received : February, 1962. 


382 


4 


ON SPECTRAL ANALYSIS WITH MISSING OBSERVATIONS 
AND AMPLITUDE MODULATION* 


By EMANUEL PARZEN 
Stanford University 


SUMMARY. The notion of an asymptotically stationary time series and its spectral analysis 
was considered by the author (Parzen, 1961b). An important example of an asymptotically stationary 
time series is an amplitude modulated stationary timo series. In this note, the problem of spectral analysis 
of stationary normal time series with missing observations, recently treated by Jones (1962), is treated as 
a special case of the problem of spectral analysis of an amplitude modulated stationary normal time series, 


1l. [INTRODUCTION 


Let {X(t), t= 1, 2, ...} be a discrete parameter time series with zero means 
and finite second moments. It is said to be weakly (see Doob, 1953) or covariance 
(see Parzen, 1962) stationary if there exists a function, denoted R(v) and called the 
covariance function of the time series, such that for v = 0, 1, 2, ..., 


Riv) = ELX(t) X(t-+0)] va (ah 


independently of t— 1, 2,.... It is said to be asymptotically (weakly) stationary 
if instead of (1.1) it holds that 


Ro) = Jim p È BX Xio). < (12) 


Tf either (1.1) or (1.2) hold, the time series is said to be ergodic if the sample covariance 


function 


Rolo) = që È XW X+) -a3 


is, for v = 0, 1, ..., a consistent in quadratic mean estimate of R(v). In order for this 
to be the case it is necessary and sufficient that for each v 


lim var[Rr(v)] = 0. ni (1.4) 
To 


One important way in which asymptotically stationary time series ariso Që 
by amplitude modulating a (covariance) stationary process. 


Lot( Y(t), t = 1, 2, ...} be a stationary time series with zero means and covari- 


ance function 
Ry(v) = ELYG) Y(e--0)). se TH 


“Prepared under Contract Nonr-225 (21), (NR-042-993), for Office of Naval Research; reproduction 


in whole or in part is permitted for any purposo of the United States Governmont. 
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Let {g(t), t = 1, 2, ...} be a non-random bounded function possessing a genera- 
lized harmonic analysis in the sense that for v = 0,1,... 


P 1 To 
Rv) = pan T A g(t) g(t+-v) i «a (1:8) 
exists. The time series 
XA = g(t) Y(t) z (8) 


may be called the original time series Y(e) amplitude modulated by the function g(e). 
Since e 


ELX(t) X(t+-2)] = g(t) gt-+v) Ry(v) ws (1:8) 


it is clear that while X(e) is not covariance stationary, it is asymptotically stationary 
with covariance function Ry(v) given by 


Ry(0) = Po) Ryto). (1.9) 
It is shown by the author (Parzen, 1961b) that if Y(¢) is an ergodic normal 


process, then X( e) is ergodic. Consequently, given observations {X(t), #1, 2, ..., Tha 


consistent (in quadratic mean) estimate of R,(v) is given by the sample covariance 
function 


Rj > È X(t) X(t). (1.10) 


A consistent estimate of Ry(v)is then available at all lags v for which R,(v) ~ 0, namely 


Ry(v) = By(v)/R,(v). 


(1.11) 
From these facts we obtain immediately the following theorem. 


Theorem 1A: Let {Y(t), t= 1,2, ++} be stationary and normal with zero 
means and covariance function Ry(v) satisfying 


oe: 
lim — 2(y) = 
STE OS, ve (1.12) 


so that Y(e) is ergodic, 


Suppose that the time series Y ( 


e) is not directly observed, R 
he t i th 
a time series Ke) which is an amplitude modulated version of Y(*): cae gear 


X(t) aa g(t) Y(t), t=1, 2, ey a 13) 
where g(e) is a non-random i : ; 
oa. y function possessing a covariance function R,(v) defined 


Rw) # 0, v= 0, L s.es 
a consistent in quadratic mean estimate 
unobserved time series Y(e) is given b 


(1.14) 


of the covariance function Ry(o) of the 
Y (1.11). 
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Assume next that the series Y(s) possesses a spectral density function fy(o) 
so that 


Ry(v) =f cos vo fy(o) do. AA gaa 


Given consistent estimates R,(v) of Ry(v), one may construct consistent estimates 
fylo) of fyleo) in a multitude of ways by suitably choosing the weights /;(v) in the 
formula 


> LË = 
BelO) + — Z cos wo kelo) Bylo); sis. (1.16) 


1 
27 


i rlo) = 


proofs of this assertion are essentially given in Parzen (1961a) and Parthasarathy (1960). 
We do not discuss this assertion further here since we will actually obtain a formula 


for the asymptotic variance of the estimate fy(o). 


2. MISSING OBSERVATIONS 


There exist time series (Y(t), t = 1, 2, ...}, defined at equally spaced intervals 
of time, which are systematically unobservable. For example, in radar studies of 
the surface of the moon, one observes a time series Y(*) which represents the echo 
(reflection from the moon) of a radar signal transmitted to the moon. In order to 
stematically cease transmission during the intervals in 


receive the echo, one must sy: 
Another example of missing observations is the case 


which one is receiving the echo. 
of a time series which can be observed only during certain hours of the day. 


A time series with missing observations seems to be best regarded as an 


amplitude modulated version of the original time series : 
X(t) = gt) Y), t=1,2,..., a A) 


where (i) Y(e ) is the time series under study, assumed to be defined at successive equally 
spaced points of time, (ii) g(e) is defined by 
g(t) = 0 if Y(t) is missing at time t, 
= 1 if Y(t) is observed at time ë, vee (2.2) 


actually observed values of Y(e), with 0 inserted in the 


and (iii) X(e) represents the 
Y(t) is missing. 


series whenever the value of 
A caso of particular interest is the case of systematically missing observations. 
series Y(¢) is periodically observed for æ time points, then not 


Suppose that the time p ally 
) is a periodic function with period «+2, and 


observed for f time points; then g(e 
pje # i= du 
=0 if t=a+l,...,a+f. w+ (3) 
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It may be shown that a periodic function g( e) possesses a generalized harmonio 
analysis. Ifthe period ofgte)is 0 (for g(e) defined by (2.3), 9—x--p), then its covariance 
function R,(e) has period @ and is given by . 


B,v) = 5 È g(t) glt+-6). nn (2.4) 


m 


Thus for g(+) defined by (2.3), the covariance function R,(e) has period «+2. 


To determine its values for 0 < v < a--B—l, we distinguish two cases : (ash 
and (ii) «> £. 


TABLE 1. VALUES OF Rọ (v) 
ee maaaħĖõI 


case (i): a KB case (ii): a > B 
a—v a—v 

EEO Une Q, app? 2 = % aB 
0,v=a, ..., B z 


orp? VEQ, ..., QB. 


Only in the case œ > P (one observes more values than one misses) does Rw) 
never vanish. Therefore in order to be able to estimate Ry(v) we must assume that 
a> fp. 


3. VARIANCE OF SPECTRAL ESTIMATES 


A In this section we find the variance of the estimated spectral density function 
Fyle) when it is formed from observations of an 


amplitude modulated time series 
X(t) satisfying the assumptions of Theorem LA. We first note that f zlo), defined by 
(1.16), can be written i 


Flo) = sm E, 90) ot) Yle) Ye 


cos w(s—t) kp(s—t) {R(s—t)}4, 


(3.1) 
In words, feta) is a quadratic form in the time series F| (e). 
Let a(s, t) be a Symmetric function of two variables, and lot 
Jrlals, t)| = E as, t) Y(s) Y(t) e (8.2) 
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ON SPECTRAL ANALYSIS 
denote a quadratic form in the stationary normally distributed random variables 
Y(e) with covariance function Ry(s) and spectral density function fy(e). It may be 
verified that ' à i 
7 
var (Jyjats, t)[] = © as, t) atu, v) (Ry(s—u) Ry(¢—v)+Ry(s—v) Ry(t—u)} 


stu, v=1 


2) La dro fre) 4A, Ay a (33) 
defining A(Aj, Ag) =È a(s, t) exp [2(sAy+tA2)]. see. (3.4) 


The usual case considered in the theory of spectral analysis of stationary time 


series is the case where 


als, t) = 5 €08 o(s—t) kale) ve (3.5) 

anë kpl) = KBro) ve (8.6) 
for a suitable weighting function k(v) and constants By satisfying 

By—0, TByso00 as T—co: woe (37) 


for the exact conditions to be satisfied by the covariance averaging kernel k(v) (see 


Parzen, 1957, p. 336). Define 
2 E -2 ; 
Kolë, do) = FT ef exp [i(s,-++te»)]kp(s—t). we (8.8) 


By the argument employed in Parzen (1957, p. 342), one may show that for suitable 
(Ay, Ag) which are symmetric in the sense that 


Fàs — Àa) = f(A Ào) 


functions f 


it holds that 
lim TBy FoF f0 As) Koton Aatos) Erl — Aos —Aj— 04) dd: 
TI a SS 


o 
Fon 02) lis ku)du if w = og; Qa = OC: 


è (3.9) 
0 otherwise. 
In particular, (3.9) holds for a function f(«;, 2) of the form 
1 ke 5 
Ko 0 = On wy, ki ame [e(e4+02@)] R(r4, va) ni (3.10) 
S IR 
[Blv va)| < œ. s (8.11) 


where 
Vi, VS TO 
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A careful derivation of (3.9) would unduly lengthen the present paper. However, 
let us sketch a proof. It suffices to show that (3.9) holds for 


FA: As) = exp [H(Ayry+-Agrs)] ni (3.12) 
for arbitrary integers v, and va} Under (3.12), the double integral in (3.9) can be 
written : 


(47°T)1B, $ kals—t) kyp(u—v) I(s—u+v,)J(t—v+-v,) exp[i( 


801+ 102—UM3—V04)], 
8,t,u, v—l 
(8.13) 
defining KË = f ef dx ap sine ve (8.14) 
-r a 
In (3.13) make the change of variables 
U tv, z—u—v. 

so that s=@+u, t—y—tu, v= u—z. 
Then (3.13) becomes 

(47r?) Bp E J(+) T(Y+va)krlz)kr(z+r—y) 

T, Y, z 
5 Pa E 
exp [aes tyoz fitoj ox 7 = exp liuw — oo —0,)]. za (815) 


As T tends to co, (3.15) has the following limiting values : if 
, a ' E ©) = o and o = oy 
then it has the value : ee 


Gri $ Jeto) 


z, y=— 


Yv) exp (zo) yeo)) ' kYu)du 


(3.16) 
and 0 otherwise. To conclude the proof of (3 9) we need only note that 
-1 $ ° PJ 
(27) Re J(@+,) exp [izo] = exp [—iv,0,). (3.17) 


We next show how using (3.9) 
of the spectral density function 
are then considering quadratic 


one may derive an expression for the variance 
of an amplitude modulated normal time series. We 
forms corresponding to 


1 
as, t) = InP C8 @(s—t) ko(s—t) h(s, t), (3.18) 
defining h(s, t) = g(s) g(t) {B,(s—t)}1, (3.19) 
Ë We consider only the important Special case that g(t) is a periodic function. 
g(t) has period 9, then it possesses the harmonic representation 
N 

g(t). = a & exp [it A,] G, (3.20) 

where À 


n = 27/0, N = 9/2 or (0—1)/2 according as 0 is even or odd, and 


AL 
Go Qi exp [—isa,] g(s) for T= 0, + 


El... HOJ2), ... (8.21) 
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while e, = 1 for all n except that if ô is even e+y = 1/2. It may be verified that R,(v) 
is an even function of period 0 given by 


së 
Rto— 3 y eG,G n exp[—ivà,]. s+. (8:22) 
y . 
Now g(s) g(t) = = 2n Emën EXP [i(sAm-tA,)] Gunan 
N 
(RAS—HJT— $ së Emen EXP [2(SA,,+1A,,)] Wan ssa (8:23) 
8 * 
where, defining y= > a exp [—is à, ]{B,(s)}>, 
Ware = Wa if m = =f; (3.24) 
=0 otherwise. 
se 
Consequently h(s, t) = oe emën OXP [2(SAm+tA,) Ann =e (3.25) 
where Hra == ej Oy Wi Ca Gen = x ej W; Ga Gai bee (3.26) 
jk i 
It should be noted that 
Hoo == & W; G; G; 
g 
=F È Ris) (Ria 
s=1 
=1. a» (827) 
| i We next write 
I ADI Sy X cos'o(s—t)kip(s—t) hs, t) exp [i(sdy-+tA,)] 
ML st 
1 N = k 
“nT m, A “nen Ann ai E a=) nët) 
oxp [7(s{A,+An}+-t{As+A,})] 
- = Em En Hin HK p(AA+A,,+0, Ag FA,—o) 
| mn=-N 
\ 
8 +Kp(Ay+An—o, Ag+A,+e)}. ses (3.28) 
N 
j 


We are now in a position to evaluate 


TBy var [fy(o)] = 27By ff Ady de Frea) ACn Ag)”. 
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By (3.28) and (3.9) one sees that, as T tends to co, 


T Bp var [fyplo)l4 2 Emën JE I rn to fo) fron ot) T ku)du 
= QË enen Hal? Setotëm) flora) T Proj 


=| Flo)+ a ental Ena” felo tAn) folo-+Ag} Ë Pudu. —.. (3.29) 


The foregoing formula is valid for 0 < o < T; it should be multiplied by 2 in the 
case that ù = 0 or o = 7. 

If the spectral density function fzo) 
servations of the time series Y(e) 


(for 0 K 0 <7) 


had been directly estimated from ob- 


5 the variance of the estimate falo) would satisfy 


jim TË var [fy()] = Lo) T k (udu. (3.30) 

y Consequently one can infer from (3.29) the effect on the variance of the esti- 
mate fy(&) due to the fact that it is formed from an amplitude modulated version of 
the time series Y(e). An upper bound to this variance is 


TBy var (fy(o)] < H {max fito) F tudu (3.31) 


where H = nen | Lin, n]? (3.32) 


Thus H may be taken as a measure of the increase in variance due to amplitude modulation 


One may verify that 


9°(s) g°(t) {R,(s—t)}-2, (3.33) 


` 


An upper bound for H can be obtained as follows. 


Let p be a lower bound for R,(v) : 
(RO) >p, v—o0,1,...,8. 


(3.34) 
Then H -afl : 
SP 
SP G A ON (3.35) 
An exact evaluation of Hf can be obtained from the formul 
a 
= 1 0-1 
H=; = (Roje 1 oy 
Ta PA 2 POP oj). (8.36) 
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To illustrate the use of these expressions we consider the modulating function 
Ne), defined by (3.3), which corresponds to the case of periodically missing observa- 
tions. Then 0 = ap, 


Ah 1.8 monarg 
p= TEP g È POSË. ne (8.87) 
a SË 2 1 2 
By (3.35) Hg 5) =(7-,) e (3.38) 
defining r= A e (3.39) 


to be the ratio of the number of observations missed to the number observed. An 


exact expression for H can be obtained from (3.36) : for v = 0, 1..., 8 


—V 
> VS, 


1” at o2 pj 
7 2 ONET) a+B 


0 


=0, v>a. . (8.40) 
Consequently, 


(Ro) = P(t) g+) = aes v= 0,1; ...; 8 


_ (a—v)(a+f) 
pi Sh 


=0, v>a. 


wena 


lt, 0, a48 S (%—v)(a+/) 
Thus (ah) H = ath te Sea eae (a—BE ` 


Finally, one obtains 


ad eet gy La Tl 
H rl a—p rt rr 


1 2 2 ar 2 , &—B+1 
=o ri ee KË JË T (3.41) 


By replacing every denominator by a—f, one obtains the following upper bound for 
H: 
atë _1+r 
SS ne (8.49) 
One easily verifies that (3.42) provides a lower upper bound than does (3.38). In 
ay gjenë, both G38) and 3.42) provide some measure of how rapidly the variance 
of the në estimates increases as the ratio 7 tends to 1, 
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It may be of interest to express the variance of f. y(@) in terms of the number 
To of observations actually observed : approximately, 


QE që vi (3.48) 
“arë 
Combining (3.31) and (3.42) one sees that 
rei çë pd 


(max fi KO) ie I?(u)du 5 CEB, SË 
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A METHOD OF CONSTRUCTION OF RESOLVABLE BIBD* 


By ESTHER SEIDEN 
Michigan State University 


SUMMARY. A geometrical method of construction of resolvable BIBD with parameters 
(v, b, r, k, N) = (224-1 — 27-1, 22N-4 1, 27-1, 1) is obtained, whero n is a positive integer greater than or equal 
to two. 


1. INTRODUCTION 
A balanced incomplete block design, BIBD, with parameters v, b, r, £ and 
A is an arrangement of v treatments in b blocks of size 1: (<v), such that no treatment 
occurs more than once in a block and every treatment occurs in exactly 7 blocks 
and every pair of treatments occurs together exactly A times. The design is called 
resolvable (Bose, 1942) if the blocks can be divided into sets such that each treatment 


occurs in each set. 


It is well known that a finite projective plane of order 2” i.e. with 2"+1 points 
on a line may contain a set of 2"+-2 points, three on one line called henceforth an oval. 
If such a plane is Desarguessian then ovals consiting of 2"+-2 points can be effectively 
constructed. This fact will be used in order to construct resolvable BIBD with 
parameters (22"-1—2"-1, 2241, 2"41, 27, 1) in which the 2?"—1 blocks can be 
divided into 2”--1 sets each comprising 2”—l blocks such that each variety occurs 


in each of the sets. 


2. METHOD OF CONSTRUCTION 


Tt was shown by the author (Seiden, 1961) that the existence of an oval consisting 
of 2712 points in a Desarguessian plane of order 2” enables a constriction of two 
partially balanced incomplete block designs, PBIBD, in the loving manner Re- 
move the points of the oval. Then consider separately the lines which contained two 
Points of the oval and the lines which contained none of them. If the lines are identi- 
fied with the blocks of the design and the points with the varieties then one obtains 


this way two PBIBD. 

Let us now focus our attention on the PBIBD consisting of the lines which 
did not contain any point of the oval. This design has .2%"-1— gu blocks of size 
2"-11, the number of varieties being 2”'—l. Every two Dioda of this design will 
have clearly one element in common. Itis easy to see that if wa interchange now the 
and the lines then we will obtain a BIBD consisting of 22"—1 blocks, 


fi ints E 
sp a point of the plane excluding the points of the oval. The 


Each block corresponds to 


“mm i ma morandum was partially supported by the National Science Foundation Grant, 
nis resoar' 


No. G18376. 
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block size will be clearly 2" and the elements of the block correspond to the lines 
not including any points of the oval which pass through the related point. It remains 
to establish that the so-obtained BIBD is in fact resolvable. 


Consider any point 
of the oval say O, and a line through it J. 


l contains 2"—1 points not belonging to the 
oval. Each of these correspond to 2"—1 different blocks of the BIBD. The 2-1 


elements of each block are the lines which do not include any point of the oval. 
ly they have to be all different and this will exhaust all 22"-1_9”~1 Jines 
replication of the design. The same will hold in respect to the rem 
passing through O. Thus yielding the desired 2"4+-1 sets of the res 
Here is an example of a design obtained by the described method for 


Clear- 
and give one 
aining 2” lines 
olvable design. 


n — 3. 
Te 1 2’ 8 F 1 8 9 10 
5 15 22 24 Ee Oy p7 2 12 18 92 
PA OT 397 3 12 23 28 3 16 25 97 
He T6 MLS 28 4°10 16 91 4 15 20 28 
8 14 23 26 9 11 15 19 5 19 21 96 
10 13 19 25 13 1 18 27 6 18 17 23 
I 17 20 27 17 22 25 96 m Mia oz 
DATAT 1 1 15 16 Las 16 
AT 9 2 2 10 ll 26 2 15 27 
3 6 20 26 3 13 21 24 a mp 9) Wa 
4 19 24 27 i 4 8 13 29 
5 8 160 17 5 18 20 25 7 10 20 23 
10 “14 1221 28 6 8 19 2g 1 21 25 28 
15 18 21 23 9 22 23 97 12 16 24 96 
i 20 21. 99 1 23 24 95 1 26 97 28 
2 16 19 23 2 5 13 28 2 14 17 2 
3 8 11 18 3 10 15 1 3 19 99 
4 86 14 25 a NDI TRARË 4 5 n 33 
Di 10) 12) 27 6 11 16 29 6 10 18 24. 
7 13: 16 26 7 8 21 2% E 12 je ox 
17 24 28 12 14 19 20 9 13 16 20 
After the paper was submi 


tted for public. 


ation R, C, 
he was aware of this result since 1949 


; Bose informed me that 
hens : ama used it in a proof in his joint paper with 
5. 8. ohrikhande. [R. C. Bose and Shrikhande (1960): On the : 

of mutually orthogonal latin Squares and the falsity of a ee ohana 
Transaction of the American Mathematical Society, 95 191-209 ] “Snjecture of Euler, 
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SOME LIMIT DISTRIBUTIONS CONNECTED WITH 
‘FIXED INTERVAL ANALYSIS* ` 


By J. SETHURAMAN 
Indian Statistical Institute 


and 
University of North Carolina 


SUMMARY. The proofs of some theorems stated by the author (Sethuraman, 1963) on the 
limiting distributions of some statistics that enter in the method of Fixed Interval Analysis are presented. 


1. INTRODUCTION 

Let (Y, X) be a random variable taking values in (4% x 2) where Y is E, the 
Euclidean space of Æ dimensions and @ is a measurable space. Let Z,, Bs... 
be g disjoint measurable sets in @ whose union is the whole space 2. 

(Yas 21) (Vata): +0 Um x,) are n independent observations on (Y, X). The 
number of vs that fall in Z; is ng, j = 1...-,g- 24 is defined by the relation 

u, = Syn: I 

where Y$ is the summation over all “i” such that æ; is in Ej. 

Throughout this paper it is assumed that 

VY) < œ oe Ly 

and prob (XeH;)=7>0 j=1,..,9 ve, (QË) 
where for any random variable Z, v(Z) denotes the variance covariance matrix of Z. 


The following theorem is established in Section 3. 


Theorem 1: The asymptotic distribution of (ty, ..., U) is the distribution of 


g independent normal distributions. 
This theorem plays a fundamental role in the method of Fixed Interval 
1S a 


Analysis (for instance, see Sethuraman (1963)). Interpreted in Sample Survey 
TA this theorem, among other things, states that the post-stratified stratum 
Se. 2 


means are independently distributed in the limit. 


9 


a 


Let Y(B;), 0 
H,, denote a random vari i 
a e). 
= prob (YeA, Xeli;)/prob (XeE,) 
vector of expectations of Z. | x 
r h work was done at the Indian Statistical Institute. The writing of this paper was 
* This Se Division of the Air Force Office of Scientific Research under arene niher 
supported by the Ma 
AF-FOSR-62-169. 


NOTATIONS, DEFINITIONS AND PRELIMINARIES 

alled the conditional random variable of Y given that X is in 
able on 2£ with the distribution defined by prob (Y(H,)eA) 
For any random variable Z, E(Z) denotes the 
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Define E(Y(E,) = u, wes (21) 
E VY(E,)) = 3, ; sx (22) 
Pj = nin gan (2.3) 
Vluj— u) = nj) ; s (2.4) 
VIP) = Ela) ni (2.5) 
= ig: 


Let {2,(-, 0)), n = 0, 1,... be a sequence of families of probability distri- 
butions on the Borel-subsets of H,, (or more generally, of any topological space) and 
0 vary in a compact topological space K. 


Definition: (ËN, 0)} is said to converge weakly, uniformly and continuously 


“(in other words, in the UC* sense) to &,(-, 0) with respect to 0 in K if for every bounded 
continuous function h(y) on Em 


J oy) Enldy, 0) Tat dy, 0) uniformly in 0 
and Sy) Exdy, 0) is a continuous function of 0. 


The following theorem given by the author (Sethuraman, 1961) will be used 
in Section 3. 


Theorem 2: Let (Y,, X,,) be a sequence of random variable on ( 
Sis a complete separable metric space. Let the conditional probability measure of Ya 
given that X,, = x be denoted by E,(, x) and the marginal distribution of X, be p, Let 
Enl, x) converge in the UC* sense to Éole, x) with respect to a in any compact subset of 


S and H, converge weakly to Ho. Then the joint distribution OF (Pas. Xa) converges 
weakly to the distribution determined by (+, x) and jug or, more precisely, to the distribu- 
tion of (Yo, Xo) where 


BE, KS) where 


prob {¥ 5 eA, Xo € By = EKA, x)u(da). 


Lemma which is immediate, is useful in e 


stablishing the UC* convergence 
of a special sequence of families of distributions. 


Let Zi.“ Ziks(o) 
Zar «+1; Z2ka(0) 
Z, 


ND Zn) 
be a triangular seheme of random variables ji 

be at ables in E, where the vari i : 
identically and independently distri i bë gjë pj 


buted. Assume that Fi 
= V, are finite and that V,V as n0. Again let inf k ; 


Let MN(a, L) stand for the multivari ‘ 
$ variate normal distributi i 
a and variance covariance matrix L, a een 


are 
Za) = Vi and VIZ,.) 
n(0)— co as n— œ. 
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The conditional distribution of Yi, -- 


SOME LIMIT DISTRIBUTIONS CONNECTED WITH FIXED INTERVAL ANALYSIS 


Lemma 1: The sequence of families of distributions of 
{Zn + pe +Znkn(0) —k,()v,,)/ V 1 (0) 
converges in the UC* sense to the distribution MN(0, V) with respect to 0. 


3. MAIN THEOREMS 


We first prove the following lemma. 
Lemma 2: The distributions of (ni (n), ..., 9,(n)) given that t(n) = z, x zj — 0 
1 
converges in the UC* sense to the distribution MN(0, A) with respect to z in any closed 


bounded subset of B,, where 


De 0 0 
Mi 
0 See 
pa Ta ores) 
| 0 Ly 


Proof: The event &(n) = t is equivalent with probability one to the event 


, g, since 


m= [nm Vn z] VE Nju 
prob {amt Vitin) = [nm Viki), i= 1, 9) 


SY given that n; = [n7;+4/nz,], i= 1, ag 


is the distribution of g independent pres of size ny, ..., ng on Y(Z,), . YŒ), 


respectively. Ja y(n); 
pendent samples. For æ in a closed bounded subset of E, we note that om 


[nz,+V/nz,]00 as N00. Thus all the conditions of Lemma 1 are satisfied. Further 
[n7,+ Vn zan tends to 775, uniformly in # in any closed bounded subset of Z,. Hence 
the conditional distributions of (ni (7), +++) Ny(%)) given that E (n) = æ converges in the 
UC* sense to the distribution ZN (0, A) with respect to 2 in any closed bounded 


sal a n(n) are the normalized means of these g inde- 


subset of H, 


Theorem 3 : 
the distribution MN(0, B) 


A 0 ) 
where B -( a ne (89) 


The joint distribution of (mil), .... Ny (70), E(n)) converges weakly to 
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m(l— r) — NT ste mT, | 
— TT 7(1—719) eas — TT; I 

and where C= | ve (3.3) 
mT, Tf, RC 7,(1—71y) 


Proof: This theorem is an immediate consequence of Theorem 2, Lemma 2 


and the observation that the distribution of t(n) converges weakly to the distribution 
MN(0, 0). 


Proof of Theorem 1: Theorem 1 is contained in Theorem 3. 
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A METHOD OF CONSTRUCTION OF INCOMPLETE 
BLOCK DESIGNS 


By S. S. SHRIKHANDE 


Banaras Hindu University 
and 
D. RAGHAVARAO* 


Bombay University 


SUMMARY. A now method of constructing @-resolvable and affine @-resolvable balanced 
incomplete block (BIB) and partially balanced incomplete block (PBIB) designs is given. Two series 
of BIB designs with b = 4(r—)) are also obtained. 


1. INTRODUCTION 


An incomplete block design with parameters v, b, r and kis an arrangement 
of v treatments in b blocks of size k( < v), such that no treatment occurs more than 
once in a block and further every treatment occurs in exactly r blocks. An incomplete 
block is said to be balanced if every pair of treatments occurs in exactly A blocks. 
If we can define an association scheme for the treatments as given by Bose and Mesner 
(1959) and if the pair of treatments which are i-th associates occur together in 4, blocks 
(i = 1, 2, ..., m), then the design is called a PBIB design if all the A’s are not equal. 
These designs were first introduced by Yates (1936-1937) and Bose and Nair (1939) 
respectively. 

An incomplete .block design with parameters v,b,r and k is said to be 
a-resolvable (Shrikhande and Raghavarao, 1962) if the blocks can be divided into ¢ sets 
Si So, ..., S, each of # blocks, such that in each set every treatment is replicated 
x times. We then necessarily have 

va = kB, r=at, b= ft. wee (Lad) 

An a-resolvable incomplete block design will be said to be affine x-resolvable, 
if every pair of blocks of the same set intersect in q, treatments whereas any pair of 
blocks from different sets intersecting, treatments. A 1-resolvable and an affine l-rosolv- 
able design may be simply called resolvable and affine resolvable dosign respectively. 

We have shown that an x-resolvable or an affine a-regolvable design can be 
constructed, given two equireplicate designs D, and D, kës dj is resolvable or affine 
resolvable and has the same number of blocks in each replication as the number of 


treatments in D,. This method of construction is presented in the next section. 
j: 


2, METHOD OF CONSTRUOTION 


a 


Let D, be an incomplete block design with parameters 2, b,, 7, ky and Py 
bo a M incomplete block design with parameters va = kavy, dy = Ty, Ta, kg. 


* Th ork of this author was financially supported by the Government of India Research 
e wi 


Fellowship. 
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Let the blocks of the j-th replicate S; of D, be arbitrarily numbered 1,2,..., vı (j = 
1, 2, ..., 72). Let M = (m,;) be the usual incidence matrix of Di her my = 1 or 
0 according as the i-th treatment of D, occurs or does not occur w the j-th block of 
D,. Let M; be the incidence matrix obtained from M by replacing oach treatment 
i of D, by the set of treatments contained in the i-th block of S;. It is then easy to 
see that 


N=(M, M, ..., M,) (24) 


is the incidence matrix of an a-resolvable incomplete block design D with parameters 


PSY b= bira, T tro, k— ky ko dhe (2.2) 
a=1,R=b, tro, 
where each set of blocks M; is an a-replicate of D. 


We now consider some particular cases of the above construction. 


Case 1: D, and D, BIB designs. Let D, be a BIB design with parameters 
ti, di, 74, ky, Ay and D, be a BIB design with parameters v, = ligë, ba = Fadi, 72: kas Ag 
Then as shown above we get a design D with parameters given by (2.2). Further, it 


is easy to verify that any two treatments of D occur together in exactly rat A2 — A2) 
blocks. Thus D is a BIB design. Hence we have 


Theorem 1: The existence of a BIB design D, with parameters 


U, bı, Tis kis M vec, (2.3) 
and a resolvable BIB design with parameters 


Va = kavi, by = Tivi, Ta, ko, Ag e (2.4) 
implies the existence of an a-resolvable BIB design D with parameters 


V = Ve, b= bira r= Tifa k = kika 
AS ridatAilra—Àa), a = ry B= by, b= ry 


In particular, consider the case when D 


118 a symmetrical BIB design and Dy 
is an affine resolvable BIB design. i 


We note that any two blocks of D, intersect in Aj 
treatments and any two blocks of different replications in D 


E . kë 
a Intersect in treat- 
ments (Bose, 1942). 


. va 
It is then easy to verify that in D any two blocks of t 


he set M; 
sBPeTsch in Ajka treatments and any two blocks from different sets M, and M, intersect 
cL lied i 

in aa treatments. Thus D is an affine « 


-resolvable BIB design. 


Since the para- 
meters of an affine resolvable BIB desi 


gn can be expressed in terms o 
n > 2, m > 0 (Bose, 1942), we have 


f two parameters 
the following ; 


Theorem 2: The existence of a symmetrical BIB design with parameters 


Sht, =k A, 


(2.6) 
and an affine resolvable BIB design with par 


rameters 


Va = nk, = Mi(n— 1)m- 1), by = nr, = nnmn 1), Aa = nmt .., (2.7) 
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implies the existence of an affine a-resolvable BIB design with parameters 


V = Vo, b = ba, T = Tyro, k = kiko, A= 11Aa—-Aj(r9— As) 


a=, P=n, t= Ty “e (2:8) 


Tf s and p = s*+-s+-1 are both prime powers, then we can take D; to be PG(2, s) 
amd D, to be EG(2, p). Hence we get the following : 

Corollary: If s and p = s?+s+1 are both prime powers, then we can construct 
an affine x-resolvable BIB design with parameters 

v = p’, b = p(p+l), r = (p+1)(s+1), k= p(s+1), A= s+p+1 

a=s+l, f =p, t= p41. “e (2.9) 

Case 2: D, a BIB design and D, a PBIB design. Let D, be a BIB design 
with parameters 24, b1, 74, kj, Aj and D, be a resolvable PBIB design with m associate 
classes having the association scheme A and with parameters 


Vy okays, DT Tasha, Asay teen tne s (2.10) 
Then analogous to Theorems 1 and 2, we can prove the following theorem : 

Theorem 3: The existence of a BIB design D, with parameters (2.3) and a 
resolvable PBIB design D, with parameters (2.10) implies the existence of an a-resolvable 
PBIB design with the same association scheme A and with parameters 

vt = vg, b* = rob T* = Tifa k* = ky ko, 
AR = ros tA(Te—Agi), += L2,...m vë (2.11) 
a=, B= b, t= Ta 
If in particular D; is symmetrical and D, is affine resolvable, then D is affine x-resolv- 
able. 
3. BIB DESIGNS OF THE FAMILY (A) WITH b = 4(r—A) | 

Shrikhande (1962) has shown that BIB designs of the family (A) with 
b= 4(r—A) have an interesting reproducing property in that any two members of 
this family give rise to another member of the same family. In this section we give 


two series belonging to the family. 
When 4m+3 is a prime power then we know that the following designs exist 


(Bose, 1939). 
Dj: v = b, = 4m+3, r, = ky = 2m+1, À =m 


2 2 £ 
Da: Va = Vj, da = Vin Ta = %4+1, ka = v4, Aga 1. 


Using Theorem 2 we get 


2— — 
D: v=, b = vto,r a 1 Jk Ls 1) 


— (+1)(%)—2) 
a (3.1) 


which obviously belongs to the family (A). 
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Similarly if 4m+-1 is a prime power, then the following designs exist : 
D, : v, = 4m+1, bj = 2, Tı = 4m, kı = 2m, aA, = 2m—1 
Da: va vj, ba = vit, m= vtl, ka = v, à= 1 


Hence Theorem 1 gives 


—l 
D: 0 —.vë, b= wH), r— 81, k= ae s (3.2) 


— M+1)(%—2) 
a= Sea 


which again belongs to the family (A). 
obtained by the method of block inte 
BIB design with parameters 


Tt is interesting to note that (3.2) can also be 
rsection (Bose, 1939) from the symmetrical 


v=b= v1--(vj--1)2, r=k= v3, a SH) 
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FIDUCIAL DISTRIBUTIONS ASSOCIATED WITH 
INDEPENDENT NORMAL VARIATES 


By D. A. SPROTT 
University of Waterloo 


SUMMARY. Several different fiducial distributions that have occurred in the literature, some 
of which are controversial, are closely related to the distribution of independent normal variates and owe 
their existence to a property of this distribution, This note brings these fiducial distributions together 
comparing their derivations and contrasting some of their properties, 


1. INTRODUCTION 

' There have been several examples in the past of non-uniqueness of fiducial 
distributions. These arise from different methods of factoring the distribution of the 
sample often, but not always, by means of a change of variable. Some of these dif- 
ferent fiducial distributions are characterized uniquely by a set of transformations 
that will produce them (Fraser, 196la, b) indicating that such transformations may 
form an essential logical aspect of the specification and information along with the like- 
lihood function. For example, the bivariate normal distribution gives rise to two 
“regression” distributions, characterized by two different groups of transformations 
corresponding to x independent, y dependent variable and conversely; it also gives 
rise to a symmetric fiducial distribution (Fisher, 1956; Fraser, 196la, b; Mauldon, 
1955: Quenouille, 1958; Sprott 1961). 

Brillinger (1962) cites two examples on non-uniqueness rising from two dif- 
ferent ways of factoring the distribution of the sufficient statistics without any change 
of variable involved, that is 

IS, Se: Ors 09) = f(S23 0) FOS; Or, 03183) 
= KS: 01) f(So; Or, 03| S1). 

An example of this for three parameters is provided again by the bivariate 
normal distribution, which can be written 

IS: So, r; Ti, Ta p) = f(S;, Sa: Ti Ta, Plr) f(r; p) 
= f(S1; 04) f(r; Tr P| S1) f(Sy; Tis Ca, p|7, Sy) 

It is the purpose of this note to bring together and discuss some different 
and apparently unrelated fiducial distributions, some of which are controversial, that 
owe their existence to the fact that this property is possessed by the simple example 
of two independent normal variates 2, y with means EK, y and variances unity. The 
parameters of particular concern are the ratio œ = HÌ» and the distance P = Vie, 
Unlike the previous examples however, some of the results are not proper distributions. 
It will be found that one set of distributions, characterized by location transformations, 
are obtained by conditioning on y; another set, characterized in part by rotation 
transformations are obtained by conditioning on r = Vp y? 
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x x ATIONS 
2. INDEPENDENT NORMAL VARIATES : LOCATION TRANSFORMATIO 


1 1 ETA 
Here fle, y: u) = zesp -56n 5 w]; 


setting a = cjy, a = pjv, this can be written 


1 1 E oo! SS — by ia, vy). som (DL) 
=| 5-9 sallow | gyw] = flys ») flas a vl 


. . . st the 
The first factor leads to the usual fiducial normal distribution of v, the second to 


; x ; ity. T e simul- 
distribution of æ given v as normal with mean and variance unity. Thus the 
taneous distribution of g, vis 


1 a 1 BË ave . (2) 
pll exp [-5 0y) zU) |; 


. tj 
and integrating with respect to v produces the distribution of ce given by Creasy 
(1954) about which there is some controversy. 


A similar argument (conditioning on y) produces the distribution of p, 
r 
T exp [|--o] d exp (rp cos u)du, ve (8) 
about which there is some controversy (Stein, 1959; 


distributions are related to the usual fiducial 
derived by integrating it. 


James, 1954). Both of these 
distribution of fi, v and are usually 


All of the above distributions can be obtained empirically by sampling the 
normal distribution (a',y) and applying location transformations «ya! -+¢, = % 
yy +c, = y, HOUT = pë, yyte = v’, e.g. Fraser (1961 a, b) giving 

am u(e—r') ar LA a POEET — p 
EONS) TOMË p 
where pë, v’ are independent normal variates with means 2, y and variances unity. 


3. INDEPENDENT NORMAL VARIATES : 


ROTATION TRANSFORMATIONS 
The distribution (1) 


can be integrated with respect to y by setting y = 7 sin t. 


YSPsinë, a= cott, and a= cot 0. This gives 
1 T ar 
f(r; p) = aq? eP [-4 (2-9) J exp (7p cos u)du 4 
and f(t; p, vj) = exp [rp cos(t—0)]/ r exp (rp cos u) du. „e (5) 
ò 


Thus the parent distribution can be conditioned on r 


Ha, y; æ, v) = f(r, t; P, 9) = f(r; p) f(ts p, Or). n (6) 
The first factor leads to a distribution (not proper) of p (James, 1954 : Stein, 1959), 
and the second to the distribution of 0 given p i l 
fO; t, r|p) = &XP 7P cos (0—t)) oe (rp cos u) du, oe (7 
0 
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This in fact produces the distribution of yt, v conditional on the circle ?--v? = pë 
(Fisher, 1956) and thus the distribution of æ given p. 


The product of (7) with f(g) = — 5 F(r, p) derived from (4) leads to a simul- 
taneous distribution of 0,p which does not produce the usual normal distribution of 
4, v (although combining (7) with f(p) given by (3) does). Thus the distribution of 
g,p produced this way is in conflict with the distribution (2) of æ, v. The conditional 
distribution of « = cot? can be found from (7) as 


9 I e lag 1 2 i = 5 d Š 
fel p) = 2 cosh rp Vite ral | +a?) i exp (rp cos u) du ~. (8) 


Multiplying this by f(p) above and integrating would give a new marginal distribution 
of æ different from before. Multiplying f(æ|p) above by the expression for f(p) derived 
previously (3) and integrating reproduces the Creasy (1954) distribution for x as in the 
preceding section. 

The conditional distributions f(«!p), f(vlp) in this section can be obtained empiri- 
cally by sampling the normal distribution conditional on a given circle and applying 
rotation transformations. That is, given the original pair of observations (x, y) = (r, t) 
(a2+-y2 = 7?) the normal distribution could be sampled for a new pair (x”, y’) = (r, t’) 
subject to v'?+y”° = 7°. The rotation transformation t/t = t'+ (t—t') carries ¢’ into 
t and 0 = cot“ u/v into 0' = 0+(t—’). Thus 6’ = cot 2’ is a random variable with 
distribution given by (7), so that «’ is a random variable with conditional distribution 
given by (8). It does not appear that the marginal distribution of x can be given a 
frequency interpretation this way, as no such transformations seem applicable to r 
to produce the distribution of p derived from (4). 

4, DISCUSSION 

Tt can be seen from the foregoing that the distribution of r, y can be factored 
in two ways (conditional on y, and conditional on r) giving rise to two sets of possible 
fiducial distributions of p, a, ft, v ete. One of these sets is characterized by location 
transformations and can be entirely reproduced by them; the other is characterized 
by rotation transformations and can be reproduced partly (conditional on p) by them. 

The question arises as to which, of these sets of distribution has more claim 
to validity. Those produced in Section 2 by location transformations have many 
desirable consistency properties. They reproduce the usual normal fiducial distri- 
bution of e, v and thus serve to. preserve the symmetry with respect to the actual 
experiment whereby an interchange of x with pe, y with v transforms the distribution 
of x, y into the fiducial distribution of ye, v. Thus it would seem reasonable that the 
logical status of t, y before the experiment for a known yt, v is identical to that of 
je, v after the experiment for a known 2, y. Then any statements appropriate to 
x, yY, ely = 4 and «/2®+y? = rete. before the experiment should correspond completely 
to similar statements about pi, v, æ p etc., after, and should be derivable by the above 
interchange. In this respect the particular observation « = y = 0 would cause no 
more difficulty in making probability statements about « than would the knowledge 
that y = v= 0 in making statements about a (both giving rise to the Cauchy distri- 
Also, all the distributions in Section 2 aro proper distributions and can be 
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produced empirically by sampling the normal distribution and applying location 
transformations. None of these points is true for the distributions arising in Section 
3 although for a fixed p the conditional distributions can be Produced by rotation Sinis 
formations. The distribution of P in Section 3 seems to be considered mors valid 
by some (James, 1954; Stein, 1959): some of the associated conditional ‘distributions 
have occurred before in the literature in a different context (e.g. distributions on 
circle, Fisher, 1956). 


It is interesting to note that the commonly-accepted distribution of « based 
on the standard normal variate (Fieller, 1954). 


u = (%@—ay)/\/1 Fa w (8) 
apparently has none of the above properties, nor does it seem to be associated as 
closely with the parent normal distribution by the usual fiducial argument presented 
by Fisher (1956). In spite of the fact it is invariant under the group of rotations, it 
does not appear to be derivable from it in the manner of Fraser (1961a, b) and Sections 
2,3. For if v’, y’ are sampled from a normal distribution, they cannot necessarily be 
transformed into the original pair (x, y), the group of rotations not being transitive. 
For this to be so, (x',y) must be sampled conditionally on the circle a/2-y'2 = 72 
= zty. Then visno longer a standard normal variate so that the resul 


y given by (8) and not (9). 
distributions F(p) in Section 3 and f(a) ( 


a 


t of applying 
It is curious that the 
9) for which some preference is shown in the 


Tt is possible that the distribution f(p) = Heo) obtained from (6) can be 
derived empirically by a set ( 


though not a group) 
described by Sprott (1963) 


of transformations in the manner 
- In any particular problem it would then be necessary 
f transformations were relevant, a: pointed out in Section 
) is in conflict with the location trans- 


; then vë must be zero (or less than pê). 
Thus location transformation: 


s and the resulting distribution (3) do not seem appro- 
cf. Barnard, 1963). 
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MULTIVARIATE STATISTICAL OUTLIERS* 
By S. S. WILKS 


Princeton University 


SUMMARY. This paper deals with the problem of identifying and testing a candidate set of a 
small number ¢ of extreme sample elements as significant outliers in a sample of size n from a k-dimensional 
normal distribution with unknown parameters. The problem is considered in detail for t—l, 2, 3, 4, that 
is, for sets of 1, 2, 3, and 4 outliers. The criterion for indentifying and testing a single observation as a 
significant outlier is ry as defined in Section 3(b) and that for a pair of outliers is ra as defined in Section 4, 
small values of rı or ra being critical values. In the absence of exact values for the extremely complicated 
probabilities P(ri Kr) and P(r:<r) upper bounds for these probabilities are given by (2.15) and (3.2) respec- 
tively. These upper bounds are suggested for a fortiori significance testing of observed values of rı andry. 
Some evidence of the closeness of theso upper bounds obtained for the probabilities P(r, Kr) and P(rs<r) 
is given in Table 1 for /=1, that is, for a sample from a one-dimensional normal distribution. In this case 
exact values of ry for which P(r,;<rg)= « are available from Grubbs’ (1950) tables for certain values of g. 
These are compared with the upper bounds of P(riKra) for several values of n in Table if, 

Values of ry for which the upper bound of PlriKra) has the value « are given in Table 2 for 
% = 0.010, 0.025, 0.050, 0.100; k=1, 2, 3, 4, 5 ; and n=5(1)30(5)100(100)500. Table 3 gives valua sË 
Nra for which the upper bound of Plr:Krg) has the value « for the same values of q, kand n. 

Extension of rı and r, ‘to the case of ¢ outliers is re as defined in Section 5. Expressions aro given 
for the cases t=3 and 4 from which values of r, can be determined so that the upper bound of P(ri<ra) is 
a. No tabulations have been made, however, for the cases of three and four outliers, 

In the more general problem of ¢ outliers a procedure is outlined as to how one could obtain the value 
of ry for which the upper bound of P(r;<rq) has the value «. 


1. INTRODUCTION 


Studies of criteria for the rejection of extreme observations as significant 
outliers in a single sample from a one-dimensional normal distribution with unknown 
parameters have been made by various authors during the last thirty years. 


If (x1, ..., &,) is a sample from such a distribution and if z and s? are the sample 
mean and sample variance, Thompson (1935) has determined the distribution of (xe —z)/s 
for an arbitrary E. He has proposed that for a given «, values of x; for which [zg —2|/s 
>T, be rejected as significant outliers in a sample from a normal distribution where 
Ta is chosen so that for any E, P(|%;—<z|/s>7,)=a. He determined 7, for 


æ= pay EUD and for n = 3(1)22, 32, 42, 102, 202, 1002. Thus, for instance, 


? > 


n n n 


ifa = 0-10 he expected number of observations which would be falsely rejected 
n 


as outliers, (that is, would be rejected if all elements of the sample were actually from 
the same normal distribution) would be 1 per 10 samples of size n. 


* Research partially supported by the Office of Naval Research while the author was a Fellow 
of the Center for Advanced Study in the Behavioral Sciences in the Fall of 1961. Presented at the Inter. 
national Congress of Mathematicians, Stockholm, August 20, 1962. 
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Pearson and Chandra Sekar (1936) considered (%,,)—%)/s and (z—xy))s as 
criteria for rejecting individual observations as significantly high and low outliers Baa 
pectively, where ty) < 2%) < ...< Lin) are the order statistics of the sample. In parti- 
cular, they showed that the upper tail of the distribution of (%(,,)—)/s (or of (Z—aq))/s) 
has a density function xf,(7) on the interval (V(n—2))2, Vn—1), where f(T) is the 
probability density function of (v;—2)/s = 7, say. From this fact they found 
the upper 1%, 2.5% and 109, points of the distribution of (®(n)—®)/s (or of (2—ar,,)/s) 
for values of n ranging from 11 to 19, that is, for all values of n such that the specified 
upper percentage point falls in the interval (v/(n—2))2, v. n=l). 


Grubbs (1950) extended the work of Pearson and Chandr 
dividual outliers by actually determining the distribution of (%(,,—®)/s (or of (2—ar}/s) 
in a sample from a normal distribution with unknown parameters. He tabulated the 
upper 1%, 2.5%, 5% and 10% points of the distribution of (%n)—2)/s (or of (@—ar,))/s) for 
alln<25. He also tabulated the lower 1 %:2.5%,5% and 10% points of the distribution of 


a Sekar (1936) for in- 


E (e—a, 1] (or of È (Zn 1)s*]) for all n < 25 where Z, is the mean 
QË dj), «s.s Zen- and T, is the mean of Ua +++; Tene Grubbs also considered the case of 
two high (or two low) outliers, using as the criterion of rejection’ X (Eey — En, n-1)/ 
n 
[(n—2)s?](or 2) —2,,0)/{(n—2)s%]) where 7, ,, is the mean of (4), ses Gj gj and Bis 
is the mean of Me e Cen He tabulated the lower 1%, 2.5%, 5% and 10% points 
of the distribution of these quantities for all n < 20. He mentioned, but did not go 
into the details of E (eB, alin—2)o4 as a two-outlier test, where Tin is the mean 


of Va) së Un). 


Dixon (1951) has considered ratios of form (Bay tin) tinto) Tor 
(gn t)ar to), t= 1,2,3: j=1,2 
servations as outliers and he h 
points of the distributions of th 


» as criteria for testing extreme ob- 
as tabulated the 0.5%, 19%, 2%, 5%, 10(10)90%, 95% 
€se quantities. Dixon (1950) has also studied the power 
entioned above against alternatives in which it is assumed 
ormal distributions of form N(u-+-Ac, o?) or N(u, X20?) 
or unknown y and g2, 

All of the studies mentioned above deal with th 
two extreme observations ag significant outliers in 
normal distribution with unknown parameters, 


e problem of testing one or 
a sample from a one-dimensional 


Problems of outliers in Samples from norma 
of the parameters are known or are estimated fr 


Renan dë es authors, including Irwin (1925), McKay (1935), Newman (1940), 
and Pillai (1950). Rider (agen (M8. 1952), David (1956), Pillai and Tisno (1959) 
to 1932 ” Jder (1932) has given a Survey of the literature on outliers prior 


I distributions for which one or both 
om independent samples have been 


408 


MULTIVARIATE STATISTICAL OUTLIERS 


The purpose of the present paper is to discuss in detail and present tables for 
the problem of selecting and testing one or two extreme observations as significant 
outliers in a sample from a multivariate normal distribution, with unknown parameters. 
The mathematical theory of selecting and testing three or more extreme observations 
as significant outliers is discussed, but no tables are given. 

No attempt has been made to study the power of the outlier tests discussed 
in this paper under various possible alternatives to the null hypothesis that all of the 
elements of the sample are independently drawn from a common k-dimensional normal 
distribution with unknown parameters. This would be a much more extensive investi- 
gations than the study of the tests presented in this paper under the null hypothesis. 
Such a study remains to be done. Some of the power properties of a test equivalent 
to 7, the test for the problem of one-outlier, have been investigated by Karlin and Truax 


(1960), and by Ferguson (1961). 


2. THE CASE OF A SINGLE OUTLIER 


2. 

(a) The one-outlier scatter ratios of a sample. Let (tig, ..., Ve: E = 1, ..., n) 

be a sample of size n from a k-dimensional normal distribution V({13}, || o; ||) where 
{1} is the vector of means (jt), ..., ft) and llogll is the covariance matrix of the distri- 
It is assumed that the vector of means and covariance matrix of the distri- 


bution. 
Dj) be the vector of sample means, where 


bution are unknown. Let (T, .. 


n 
në, = X xz and let 


f=1 
n Pe 
ay = a (tig — TU T); 4,j=1,...,h. se (2.1) 
=1 
The sample can be represented as a cluster of n points in a k-dimensional euclidean 
space R, Any k of these n points together with the sample center of gravity point 
(%,, ..., Zz) forms a simplex. If the volume of this simplex is squared and if the sum 
of squares is taken of the volumes of all possible simplexes which can be formed in 
this manner, it can be shown (see Wilks, 1962, for instance) that this sum of squared 


volumes is 
(kl)? lal, w (212) 


where |a;;| is the determinant of the matrix || a;; ||. It is convenient to call la) the 
internal scatter of the sample (tje, ..., Vre; É = 1,...,n); if n >k, lag | > 0 with 
probability 1. 

If we delete the &-th element of the sample we obtain a cluster of n—1 points 
in Rẹ Let the internal scatter of these x—1 points be aje | which will be > 0 with 


probability 1 if n > k+1. 
Let 


Ces: 
R= el E= Lyn Lan 
The quantities R,, ..., R, will be called one-outlier scatter ratios of the sample 
(Zirgs VE 3 Bi dy vecy PO): 
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It can be verified that 


Qij = Gig —Diedje ve (24) 
n es 
where bi = V pm (%2—2,). 
Thus we have E. 
a [aig | = lag—b bel = Jaz]. DË dl, ve (2.5) 
ije 
where lo? || = lale. 
k ve. 
Hence R= E aD bjs 
S ij nk 
i “pa E 
and since ea KJA fi)’ 
. k 
Shi NY vee (2.9) 
ve have ËR n (1 ai) 


Now, it is known in multivariate statistical analysis (see Wilks (1962), for ex 


2 —k—l k 
that for any E the ratio R; has the beta distribution B, fë 3 E ) , Where a random 


ample) 


variable z is said to have th 


e beta distribution B, 
function of z is 


(vi, va) if the probability density 


— Pvy) H-1 vu se (UT 
fe) = pë) PATE O (2:7) 


on the interval (0, 1) and f(z) = 0 outside the interval, 


Under the null hypothesis (that is, assumi 
are independently drawn from a common k-dimens 
outlier scatter ratios BEAR 


symmetric over the n-dimen: 


ng that all elements of the sample 
ional normal distribution), the one- 
n are random variables having a distribution which is 
sional space of TE R,, for which 


2 Ao 
ahos: Let Ray < sax Bi 
n The criterion we propose for selecting and testing 


(b) The ordered 
be the ordered values of Rpa R 


15 min {R}. It is the 
ier with 7, being the test criterion, 
ose In the left tail of its distribution. 


one to be tested as a significant ou 


Da It is evident 
that the critical values of rı are th 
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The joint distribution of Ri), ..., Ry), or even of R,, ..., R, for that matter, 
is very complicated. However, one can readily obtain moments of any one of the 
random variables R,, ..., R, and also certain low joint moments of two or more of these 
random variables. For instance, ; 


Sy 1 ee — 2k(n—k+1) 
aS KS (n— 1)4n--1) 
ë _ _ 2k(n—k+1) 
i cov (R, R,) = wiar ne (2:9) 


Even though it does not appear feasible to determine exact percentage points 
in the lower tail of the distribution of 7, except for k = 1 and then only for small values 
of n as we shall see later, we can determine upper bounds for the amount of probability 
in the lower tail of the distribution of r, which should be useful, at least for small 
values of k and small percentage points, for a fortiori significance testing of r,. 


First let us examine the lower limits of the ranges of Ry), cc, Rin. If we 
consider the space of (R, ..., Rp) remembering that Ri, ..., Ra must each lie on the 
interval (0, 1) it will be seen that not more thann—k—1 of the R’s, in the set (Ej, ..., RB 
can be 1 simultaneously. For if this were possible the average of the remaining R’s 
in this set would be negative. This means that Ra would be negative contrary to 
the fact that each R in the set (R,,..., R,$ must lie on the interval (0,1). Thus if 
n—k—1 R’s in the set (R,,..., R,} are simultaneously equal to 1, we would have 


nk R 
Rajt +B = as, and hence the average of Ry),..., Ris) would be 


nk 


nk which implies that Rg) > 1— FDI 


TFD T) as 
n—k—2 of the R's in the set {R,, ..., R,} equal to 1, it will be seen that Rusa) > 


Similarly, if we put 


= OCD” Continuing this process, if we put only one R in the set Ej, ..., BR) 
+2)(n— 


equal to 1, we obtain 


1 


k 
Rin > e. 


Note that it is possible for Ra), ..., R to be 0 simultaneously, in which case Ris) 


ee Therefore for left-hand end points of the distribution of 
(éF1)(m—1) 


Ri; +++) Riny We have : 
an oR Ra > 0,..., Rg >0 


nk 
Ras > SET 


nk 


Run > Gran) 
R E E 2 
bi SI ve (2.10) 
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(e) Upper bound for P(r, <7). For a fixed number 7 let us consider the 
problem of finding an upper bound for P(r, <7). Let E,,..., 2, denote the events 
for which R, <7, ..., Ra < r respectively. Then 


P(r, <r) = P(E, U --- UE,). ae (el) 
But : P(E, U --- UE,) < P(B,)+---+P(£,), ww. (212 
and P(E,) = --- = P(E,) = Plu <1), ... (2.18) 

n—l 
r ( 2 ) P nk, kj 

where Pu <r) = Sy sea heya fu E (1—u)2” du, ... (2.14) 

r( 2 ) (5) p 
since, as stated in Section 2(a), u is a random variable having the beta distribution 
B pa k ) 

E os) 

Therefore 


P(r, < r) < nPlu <r), se (2.15) 

that is, nP(u < r) is an upper bound for P(r, < r). In particular, if we choose r = 7, 
so that nP(u < re) = œ we obtain 

MERI) Ka. wen, (2:18) 

(d) The upper bound nP(u < r) as the expected number of scatter ratios with 


values <r. The quantity nP(u < r) has another useful interpretation. Suppose 


a is a random variable which has the value 1 if R; < r and 0 otherwise, £ = 1,...,% 
et 


Nr) == de, a ËT) 
f=1 
that is, N(r) 


is the number of the one-outlier scatter ratios which have values less than 
r. We have 


: E(M(r)) = E(6)-...-HE(Ë,) = nP < r) wee (2.18) 

A Wa F Plu <r), é=1,...,n. Thus, the expected number of the one-outlier 
a i ji 8 

of Pr, ca vs) R, having values less than r is equal to the upper bound nP(u < 1) 


In particular, we have 


“EN (ra) = P(r, < ra) = a. 
(e) Comparison of values of upper bound of P(r 
of P(r, < r) for a sample from a one-dimensional nor 


k = 1 it will be seen from (3.10) that Ra 


(2.19) 


1 < T) with Grubbs’ exact values 
mal distribution. For the case 


> 0 (i.e. r, > 0), and Ro > laa] 

Hence for any value of r on the interval (0, i Se th i i 

| Xn—1)) he expression (2.15) is an 
equality. For a value of r which exceeds 1J—— _ expressi i iet'i 
equality. In thi i ee Caley ee 

“OJ: 4n this case 7, is the sma) S 
3 118 the smaller of the two quantities = (Eej — Tn)? /[(n—1)s?] and 
3 (e—a T 


$=2 


n—1)s?] which à 
] which were considered by Grubbs (1950) as criteria for upper 
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and lower outliers in a sample from a one-dimensional distribution. Thus, if 7, is the 
lower 100%% point of 7, and lies on the interval (0. Emi it is the lower 
a 
1005 % of each of the two criteria considered by Grubbs. For the case k = 1 Table 1 


gives a comparison between the probability P(r, < r,) and its upper bound nP(u < r,) 
for æ = 0.02, 0.05, 0.10 and 0.20 and for certain values of n from Grubbs’ tables for 
which inequality (2.15) is a strict inequality. 


TABLE 1. COMPARISON OF P(r; < ra) WITH ITS UPPER BOUND nPlu < re) 


exact probability upper bound 


n a Ti (by Grubbs) nP(u < ra) of 
P(ri < Ta) P(ri < Ta) 
[or equivalently, 

E(N(ra)1 
20 .02 5393 -020 -020 
25 .6071 -020 -021 
15 -5030 050 +050 
20 05 5937 -050 -050 
25 -6544 -050 .052 
15 5558 -100 -100 
f 20 10 6379 -100 -100 
25 -6922 -100 -103 
` 10 .4881 .200 .200 
15 .20 .6134 .200 .200 
20 - 6848 -200 .206 
25 7319 .200 .210 


(£) Tables of values of t a for which upper bound nP(u < ra) = 4. For the case 
k > 2 it will be seen from (3.10) that the left hand endpoints of the distributions of 
Ra»... Ray are all 0. Therefore for k > 2 expression (2.15) is a strict inequality; 
and there exists no value of 7 for which nP(u < r) provides an exact value of P(r, < r). 
The problem of determining exact values of P(r, < r) for k > 2 does not seem feasible 
at present because of the complexity of the distribution of r} We therefore resort 
to the use of the upper bound nP(u < r). 

Table 2 gives values of r, for which the upper bound nP(u < rë) of P(ry < fa) 
has the value afor equivalently, values of r, for which E(N(r.)) = æ] for x = 0.010, 
0.025, 0.050, 0.100; k = 1, 2, 3, 4,5; and = 5(1)30(5)100(100)500. 


3. THE CASE OF TWO OUTLIERS 
Suppose we delete two elements, say (tig, ---, dje) and (ayy, ..., gy) from the 

sample defined in Section 2 and denote the internal scatter of the resulting cluster of 
n—2 points by |aje,| which is positive with probability 1 if n > k+2. Let 

pa = Ota, ye Gat, w BAY 
ass) 
The quantities {Rey } will be called two-outlier scatter ratios of the sample (vje... Tre: 
E — 1,...,n). The conditions satisfied by the {R;, } except that each must lie on (0, 1) 
appear rather complicated and no attempt will be made here to state them. 
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It can be shown (see Wilks (1962), for instance) that for n > k--2 each of the 


its distribution is identical 
( 3 ) scatter ratios in the set {R;,} has the property that its distribution is i 


with that of a random variable u? where u has the beta distribution B(n—k—2, k) 


Let 7, = min {R,,}. The criterion proposed here for selecting and testing 
a>é 


the strongest candidate pair of sample elements as significant outliers is 79, that is, 
the candidate pair whose deletion in computing two-outlier sc: 


atter ratios produces 
the smallest scatter ratio. 


No attempt is made'here to give inequalities for these ordered scatter ratios 
analogous to those for the {Ru -.., Rin} as given in (2.10). 


Under the null hypothesis, (that is, assuming that all elements in the sample 
are independently drawn from a common k-dimensional normal distribution) the 
joint distribution of {Ria I 1,..., 2} is symmetric in the R;,, although 
apparently very complicated. However, an upper bound for the probability P(r, < r) 
can be found by a procedure similar to that by which (2.15) was established, namely 


Pira <r) < (3) Pu? < r) ve (8:2) 
where Pw? <<") = Tee aT Fvay yuan, a (3.3) 


remembering that each Ri, 


is a random variable having a distribution identical to that 
of a random variable ų? 


where w has the beta distribution B, (n—k—2, k). 


In particular if we choose r, such that 


(Bjen a) 
we have Pg Fick: “e (3.5) 


As in the one-outlier problem, if we let N (r) be the number of the ( “ two- 


outlier scatter ratios {Rin} which have values less than r, then 


ENË) (3) Plt <n, 


In particular, sve have 


(3.6) 


S(N(7.)) — ( 2) Plut SH Ta) =a. gi (3.7) 


Values of Ve for which the upper bound 


æ for equivalently, 
æ = 0.010, 0.025, 


( 3 )p (u? < ra) of P(r, < ra) has the value 
ich GIN(7.) 
2, 3, 4, 5; 
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values of 4 ra for wh ) =] are given in Table 3 for 
0.050, 0.100; k= 1, 


and n = 5(1) 30 (5) 100(100)500. 
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4. THE OASE OF THREE OR MORE OUTLIERS 
The scatter ratio criteria for selecting and testing outliers can be extended 
to the case of three or more outliers in a fairly straightforward way. 


n 


For ¢ outliers, we define the ( i 


) t-outlier scatter ratios as 


Re. = eel, “i (4:1) 
KA 

Ëj < <8 =1,..., 2 where | aje,...¢, | is the internal scatter of the n—t points 
remaining in the sample after deletion of (Tig, +++» Ure) neo (ag 5 oe > Turg). The 
scatter ratio Rez, ,.. & is positive with probability 1 if n > k+t. The smallest of these 
scatter ratios, which we denote by 7, is the proposed criterion for selecting the ¢ most 
extreme observations in the sample and for testing this set of t observations as a set 
of significant outliers. 


Under the assumption that the x elements in the sample are independently 
drawn from a common k-dimensional normal distribution any one of the scatter ratios, 
say Re, ...& is a random variable whose k-th moment is given by (see Wilks (1962)) 


oe a sy) 
z 2 
ER, g) = TI pj yp (nE 
a +h) r( 2 +) 

h= 0,1,2, ... 
Note that the k-th moment of Fic, ---& is identical with the k-th moment of the 
product z,...2, where z,..., 2, are independent random variables having beta distribu- 


8 n—k—l k n—k—t k 
tions B, ( 5 E) ayy pjesë =), respectively. The distribution of 


Re,... &, is uniquely determined by its moments (see Cramér (1943)). Hence the distri- 
bution of Re,,.. & is identical with the distribution of the product z; ... zy and hence 


P(Re, në, < r) = Play... ET). =e (CES) 


As in the one- and two-outlier problems we find that 


(4.2) 


Po <r) < (7) Paa <), = aa 


the probability P(z, ... z, < 1) to be determined from the joint distribution of 2, ..., 2a 
described above. 


If N(r) is the number of the ( 4 ) scatter ratios in (Re, “- ¢} which are less than 


r we have, as in the one- and two-outlier cases, 


&(N(r)) = ( t) Py ...% <7). ve (45) 
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If r, is chosen so that ( i ) Blz ...%<7,) Za then 


PIS Ta) < (7) Pler 2 < ta) = ENa) = a. s (4.6) 

As a matter of fact, the probability Plz, ... 2, < r) can be reduced to a pro- 

bability involving fewer than ¢ independent beta variables if t > 1. More precisely, 

if tis even P(z, ... z, < r) reduces to an expression involving jt independent beta vari- 

ables, and if ¢ is odd it reduces to one involving -+4 independent beta variables. In 

the case of two outliers P(z,z, < r) reduces to P(w < r) as given by (3.3). We shall 
now consider the cases of three and four outliers. 


In the three-outlier problem, by making use of the relation 


Va T(2m) = 20 T(m) T(m-+4) sa (ED) 
in (4.2) for t = 3, we find 


Taar (2-3) P(n—k—24 2) n( 234a) r 
. | ie E ee 
Po—2 jan) (3 +h) P(n—k an 


2 


from which it is seen that the distribution of Re,ese, is identical with that of the pro- 
duct u?v where u and v are independent random variables having beta distributions 


B,(n—k—2, k) and B, E, 2, respectively. Therefore, 


P(Rek.es < r) = Pwo <1), së (60) 
and denoting min {Rg} by 7 

Ej DE DE: 
we have Pra <r) < ( 3) Powe <r) wee (4.10) 


where, omitting details, we find 


P(n—2) r (ee) 


—k—b E 
Py <r) = f m f (1—u)t-1 S)” duds 
SS gi" Sasi) 
P(n—k—2yr ( =) rer (4) 0 Ja 
(4.11) 
For k = 1 this expression reduces to 
n—l 
PË) r n-5 ni 
Plutv < r) = oe 2 


8 


—V) 2 
vate) fa Jv “(l-v)-tdvds ... (4.12) 
2 E 
and for k= 2 it reduces to 


Plat <1) km e PA lent zal o (413) 


n—b n—4'n—3) 
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TË we choose r, so that ( 3) Pluty <r) =a. ni (4.14) 


where P(wv < r) is given by (4.10), we have 
Pr ST) < a, 


If N(r) is the number of the scatter ratios in the set (Rejesës) having values <r we 
note that &(N(r,)) = «. : 


In the four-outlier case by making use of (4.7) in (4.2) for t = 4 we find 


N(n— 2) D(n— 4) N(n— k— 2-1. Zh) D(n—k—4-i Zh) 


(RE ett.) = Ti ke 2) D(n— k— 4) (n— 2-1 2) (n—44 2h) (4.15) 


from which we note that the distribution of Reje,ese, is identical with that of the product 
ww? where w and w are independent random variables having the beta distributions 
B(n—k—2, k) and B, (n—k—4, k), respectively. Therefore 


P(REjESEzE, = r) — Plutu” <r) i (4.16) 
and denoting min {Reita} by ry 
Ba >bs >be > Er 
n 
nës hive Pr, <’) < ( 4) Plutut <r) se (4.17) 


where, omitting details, we find that 


ae T(n—2)T(n—4) n AEE (q -ysyt 
Put SS A L2 J ot veu X5) “du de 
(4.18) 
For k = 1 (4.18) reduces to 
Plutu” < r) = Ho/r)"*[(n—3)—(n—5)r], ve (4.19) 


and for k = 2 we find 


(n—3)! ese A 3yr , 3r vr 

bu” < r) = (yr) — i 2 ae (420 
ae a e) 6(n—7)! (v7) Fe n—5'n—4 a ( ) 
Again note that if we choose k, so that 


( 4) Plutu” < re) = a, se (4.21) 


where P(u2w* < r) is given by (4.18) we obtain 
Pra <a) < a. vee (4.22 
as in the case for k = 3 if N(r) is the number of scatter ratios in the set (Besë sëzë,) 
which have values less than r we have &(N(7,)) = a. 
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TABLE 2. VALUES OF rą FOR WHICH THE UPPER BOUND nP(u<rg) OF P(riKro) 
HAS THE VALUE « [OR EQUIVALENTLY. VALUES OF r, FOR WHICH 
E(N(ra))— aj FOR THE CASE OF ONE OUTLIER 


tg $$  ——  ——— 


a — 0.010 


number of dimensions 


sample size 


a 1 2 3 4 5 
5 0.02795 0.00200 0.00000 
6 -06592 .01406 00111 0.00000 
7 -11026 .03780 .00893 .00071 0.00000 
8 -15547 .06898 .02593 .00632 .00050 
9 .19888 10358 ,04987 .01937 .00476 
10 -23942 -13895 .07781 .03866 01523 
11 .27678 17364 10755 .06200 .03129 
12 .31103 .20689 -13765 -08757 .05126 
13 .34238 ,23835 16726 11407 07362 
14 .37107 .26790 19596 .14065 .09723 
15 .39738 29556 22330 .16678 .12128 
t 16 42156 32141 .24936 19215 .14525 
17 44383 .34555 37404 .21657 .16878 
18 46440 .36810 29737 .23996 19167 
19 48344 .38919 .31940 -26228 -21378 
20 .50112 .40893 -34019 .28354 -23506 
21 51757 42743 .35982 .30376 -25547 
22 53292 “44480 .37835 .32298 .27501 
23 54727 -46113 -39588 .34125 -29370 
24 .56071 47651 41246 .35861 .31155 
25 -57334 -49102 42815 .87513 .32861 
26 .58521 50471 “44304 -39084 
5 5 34491 
27 .59641 .51767 “45716 40580 ‘ 
5 5 36048 
28 .60698 52994 “47057 - 42006 37536 
29 61697 54158 48333 43365 “38959 
30 62644 155263 495 
é ans ; 49547 ,44663 40320 
SË 7 -60048 -54835 50344 
~ A aN 50! 46318 
4 63870 -59091 54949 ELi 
45 72567 669 5 ` Kom 
126 -66994 -62588 58754 5 
50 T4745 69598 65514 "Bibs pi 
i i -653 .61947 .58711 
55 -76583 71803 -67997 6466 
60 78157 -73694 E “E 
70133 67009 
65 9521 75336 Pies 295102 
1954 -71990 69050 
70 «80715 76775 75 pasi DB 
E : 5 13620 7084 
75 81769 78048 5 ps E8806 
. 15062 72439, 
80 82708 i pc 
e poal 79181 -76348 - 73851 71563 
f «83549 -80197 «77503 75124 ; 
90 .84308 81115 5 3 PË 
të BITE 78545 76274 
95 «84995 81946 79490 i pi 
100 . 85622 .82704 8035 oe “T8038 
i 80352 78271 76361 
-92016 -90435 .89155 
300 94392 93293 92411 sk ge 
206 TË esa ËT, ae .91625 .90899 
506 pë Ten -94125 .93525 . 92969 
: .95190 94704 94254 
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MULTIVARIATE STATISTICAL OUTLIERS 


TABLE 2. VALUES OF r, FOR WHICH THE UPPER BOUND nP(u<reg) OF P(r <ra) 
HAS THE VALUE «a [OR EQUIVALENTLY, VALUES OF 7, FOR WHICH 
E(N(re)) = a] FOR THE CASE OF ONE OUTLIER—(Continued) 


— tt $$ QI  — ————— 
a = 0.025 


number of dimensions 
k 


sample size 


n 1 2 3 4 5 
5 0.05124 0.00500 0.00002 
6 - 10353 -02589 -00278 0.00001 
7 - 15787 - 05976 -01647 .00179 © 0.00000 
8 . 20934 -09953 -04111 -01166 -00125 
9 - 25636 - 14057 .07219 -03075 -00879 
10 29873 - 18053 - 10601 -05606 .02420 
11 -33677 .21834 - 14030 - 08466 -04541 
12 37094 - 25361 . 17380 - 11452 .07008 
13 -40170 - 28629 - 20589 14441 09644 
14 42950 31647 23027 - 17360 -12331 
15 45471 34433 26485 -20171 - 14998 
16 47768 37007 + 29165 + 22854 .17601 
17 - 49867 .39387 31674 + 25400 -20114 
18 51794 41593 -34022 27811 . 22525 
19 535069 -43642 36221 30089 + 24827 
20 - 55208 15547 38281 32239 + 27020 
21 56727 47324 40213 34269 29106 
22 58139 -48984 -42028 -36187 -31088 
23 59455 50538 43735 37999 2971 
24 .60685 51996 45343 39713 34760 
25 -61836 53367 - 46860 -41336 36460 
26 -62917 . 54657 .48292 42873 -38076 
27 -63934 . 55874 -49647 44332 39614 
28 - 64891 - 57025 .50930 45717 41079 
29 - 65796 -58113 52147 47034 42475 
30 .66651 59144 “53303 48287 43806 
35 + 70317 - 63587 - 58306 -53737 49626 
40 73208 67113 - 62301 .58117 54333 
45 75551 - 69982 - 65567 .61711 .58213 
50 - 77492 «72365 . 68286 -64715 -61466 
55 - 79128 -74378 -70589 . 67264 64232 
60 .80527 .76102 , 72564 59454 -66613 
65 -81738 . 77596 - 74278 + 71357 - 68685 
70 .82798 . 78904 - 75780 - 73027 - 70504 
75 - 83733 - 80060 . 77108 - 74503 72116 
so .84566 81088 78291 -75820 73553 
85 .85312 82010 79352 77001 74844 
90 . 85984 - 82841 - 80308 - 78067 - 76009 
95 86594 -83595 .81176 .79034 .77066 
100 .87150 - 84281 -81967 79916 «78030 
| 200 ` 92829 -91280 -90028 -88914 87885 
300 94949 93871 93008 92240 91530 
400 -96077 95240 94579 -93993 -93450 
500 .96783 - 96093 -95555 95082 94642 
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TABLE 2. VALUES OF r, FOR WHICH THE UPPER BOUND nPluKr,, OF PlriKra) 
HAS THE VALUE a (OR EQUIVALENTLY, VALUES OF r, FOR WHICH 


E(N(r,)) = @] FOR THE CASE OF ONE OUTLIER—(Continue.l) 


a = 0.050 
number of dimensions 
sample size i Pë 3 4 5 
5 0.08083 0.01000 
6 -14529 -04110 0.00556 
7 20661 ,08452 .02020 0.00358 

8 .26161 -13133 -05831 -01856 0.00251 

F n “7 .09559 04367 .01400 
10 .35261 .22007 18408 .07438 03440 
n .39008 .25965 17171 10731 .06033 
12 42325 ,29584 .20751 (14050 .08896 
13 45277 32886 24112 17285 11850 
14 47921 35897 27245 .20383 14785 
15 .50302 .38650 -30154 -23319 17642 
16 52457 41171 32855 .26086 .20386 
17 .54417 “43487 .35361 .28686 .23002 
18 .56208 45620 .37690 31124 (25486 
19 -57852 -47591 -39857 33412 27837 
20 .59365 49417 41876 .35558 .80060 
a rA sous “48761 .37573 .32160 
62 52692 45525 39467 .34145 

23 .63267 -54166 47178 41249 -36021 
24 .64391 .55545 48729 42929 .37796 
25 65443 -50838 .50188 «44513 39477 
26 .66429 -58053 .51563 46010 “41069 
27 .67355 59197 .52860 47426 42580 
28 68226 .60276 54086 48767 44014 
29 69048 .61296 55247 50040 We 
30 -69825 -62260 .56347 51248 46674 
35 13146 66402 -61090 56478 “62914 
40 15758 69675 64857 60654 pë 4 
45 7872 72330 67924 64067 cat 
50 -79621 + 74532 + 70472 66909 ; me 
65 .81094 76388 72624 69314 ents 
60 82354 MITE «74467 71377 Poni 
65 ,83444 «79351 «16065 73167 a 
70 .84398 .80554 .77464 74735 aon 
75 85240 .81616 .78700 76122 os 
te a ae 

: 83408 .80786 78463 Se 

90 «87267 84172 81674 79462 en 
95 .87816 84864 .82480 80367 pr 
100 .88317 .85494 .83214 81192 Ea 
200 93447 91924 90696 89602 erie! 
300 95372 “94310 93463 92711 pos 
400 .96399 95573 94924 94351 ee 
500 .97043 .96361 95833 95370 pës 


420 


MULTIVARIATE STATISTICAL OUTLIERS 


TABLE 2. VALUES OF rą FOR WHICH THE UPPER BOUND nPluKa) OF PlriKra) 
HAS THE VALUE a [OR EQUIVALENTLY, VALUES OF rą FOR WHICH 
E(N(ra)) = @] FOR THE CASE OF ONE OUTLIER—(Continued) 


æ == 0.100 


number of dimensions 


sample size 


n 1 2 3 4 5 
5 0.10000 0.02000 0.00025 
6 + 20000 -06525 -01114 0.00012 
7 .26960 .11952 -04172 -00717 0.00007 
8 32610 -17328 -08282 -02959 -00502 
9 -37418 22314 .12675 -06216 -02234 
10 41540 - 26827 - 16978 -09888 04901 
11 -45106 -30878 .21038 - 13629 - 08032 
12 48221 34511 24801 . 17267 -11319 
13 50966 37776 28264 20723 14593 
14 53405 40719 81442 23967 17764 
15 55586 43383 84358 - 26995 -20789 
16 -57550 45804 87037 29813 23651 
17 59328 48014 39502 «32433 - 26346 
18 - 60948 -50038 41777 -34870 28878 
19 . 62428 51899 43881 37139 31254 
20 63789 53615 45832 39255 33484 
21 -65043 .55205 47645 41231 85578 
22 66205 56680 49334 -43079 87545 
23 67282 58053 50912 44812 - 39396 
24 - 68286 59335 52389 -46438 -41140 
25 69223 .60535 53774 47967 42784 
26 - 70101 - 61660 -55075 -49408 44337 
27 70925 .62717 .56301 50767 45806 
28 71699 -63713 57457 52052 47197 
29 + 72429 - 64653 .58549 53268 48516 
30 73119 .65540 59583 54420 49768 
35 - 76063 - 69342 - 64023 -59385 -55182 
40 + 78375 + 72335 -67531 - 63326 -59499 
| 45 + 80245 - 74758 - 70379 - 66532 - 63023 
I 50 .81792 . 76763 «72738 -69195 -65955 
55 -83094 78451 -74728 -71443 -68435 
60 «84208 -79895 -76430 - 73369 - 70561 
65 -85173 .81145 - 77904 - 75037 - 72404 
70 . 86017 . 82238 :79193 - 76497 - 74019 
75 - 86763 . 83203 - 80331 «77787 -75446 
80 - 87427 - 84061 -81344 -78935 76717 
85 .88023 . 84830 .82251 «79963 -77856 
90 . 88560 . 85524 . 83069 . 80891 - 78883 
95 + 89048 .86152 . 83811 -81731 . 79814 
100 -89492 -86725 84486 82497 -80663 
200 94067 92574 91371 90300 89308 
300 95796 94751 93923 93186 92503 
400 - 96722 -95908 -95272 -94711 -94189 
500 97304 96631 96113 95661 95238 
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TABLE 3. VALUES OF Vr, FOR WHICH THE UPPER BOUND G ) Pere) OF 
P (r2<rq) HAS THE VALUE « [OR EQUIVALENTLY, VALUES OF gra FOR 
WHICH E(N(rq))= «] FOR THE CASE OF TWO OUTLIERS 


a = 0.010 
number of dimensions 
sample size F 5 3 4 5 
5 0.03162 0.00050 
6 -08736 -01498 0.00022 
7 -14772 04982 -00896 0.00012 
8 20444 09374 -03349 -00601 0.00007 
9 -25544 .13926 06744 -02449 ,00433 
10° -30069 -18308 10490 05181 01887 
11 .34076 -22397 .14265 . 08337 04152 
12 .37636 -26160 17912 11626 06861 
13 40812 -29606 .21363 14889 09761 
1 43660 -32758 .24595 -18044 .12699 
15 -46228 -35641 -27605 .21050 15589 
16 -48553 38284 -30403 -23893 -18382 
17 -50670 40713 -33002 -26568 .21057 
18 -52604 .42952 -35419 29082 -23601 
19 -54380 -45019 -37667 -31441 -26013 
20 -56016 .46935 39764 33655 28296 i 
21 -57528 -48714 41721 35735 30454 
22 158930 .50371 43551 .37689 32494 AE 
23 .60234 51918 45266 -39528 -34423 
24 -61451 .53365 -46876 .41261 -36248 
25 .62588 54722 48390 42895 87975 
26 63655 .55996 49816 44439 39611 
27 64657 -57196 -51162 45899 41164 
28 -65599 58328 52434 47282 42637 
29 .66489 59897 -53637 48594 44037 
30 67329 -60409 -54778 .49839 45370 
35 70925 -64753 -59694 .55228 51159 | 
40 73753 -68184 -63596 59527 55804 
45 76043 -70968 -66772 .63038 .59611 
50 - 77937 - 73276 -69411 - 65962 - 62789 
po -79532 -75222 -71638 -68436 -65482 
80 -80897 - 76887 -13547 «70556 -67796 
65 -82077 -78328 -75201 -72396 -69804 
70 83111 -79590 -76649 -74009 71566 
75 -84023 -80704 -77928 -15434 «73124 ‘ | 
-84835 .81695 -79066 - 76703 74512 
5 .85563 82583 -80086 «77840 
90 -86219 7 i 
“83384 81007 - 78866 76880 
95 86814 84110 81841 A 
100 -87356 84771 E ps 
i . -82601 -80645 «78828 
200 . 92902 -91518 -90349 -89291 -88304 
300 94974 -94023 -93218 -92489 91808 
400 96076 -95349 94734 A p Ee 
500 . 96766 -96180 95 pë Bue 
9 -95679 -95226 94804 
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MULTIVARIATE STATISTICAL OUTLIERS 


TABLE 3. VALUES OF yr, FOR WHICH THE UPPER BOUND (8) P(ut<r,) OF 
P(rs<tq) HAS THE VALUE «æ [OR EQUIVALENTLY, VALUES OF Vra FOR 
WHICH E(N(ra)) =a] FOR THE CASE OF TWO OUTLIERS—(Continued) 


ae = 0.025 


number of aimensions 
k 


SË 


sample size 


n 1 2 3 4 5 
5 0.05000 0.00125 
6 .11856 .02376 0.00056 
7 18575 -06794 01422 ` 0.00030 
8 - 24556 -11852 -04575 -00954 0.00018 
9 -29758 . 16820 - 08546 - 03346 -00687 
10 34274 21444 12703 .06572 -02579 
11 38212 .25661 16755 10110 .05269 
12 41670 29479 20581 13678 08327 
13 44728 -32932 24141 17138 11495 
14 47453 .36058 27432 .20427 14634 
15 ` .49896 38897 -30468 -23522 «17670 
16 52099 41483 .33266 .26419 .20568 
17 54097 43848 35849 29124 23314 
18 .55918 .46017 .38237 -31647 .25905 
19 .57585 48014 40450 -34003 -28346 
20 .59118 49859 42504 36203 -30642 
21 -60532 .51567 .44415 -38260 32802 
22 61842 .53155 46197 40187 34835 
23 .63058 .54633 .47863 -41995 .36751 
24 .64191 .56014 49423 43694 38557 
25 .65250 .57307 50887 45292 40262 
26 -66242 .58520 .52263 46799 41874 
27 .67173 .59660 .53560 48221 43399 
28 .68049 .60734 54784 49566 44844 
29 . 68874 .61748 55941 50839 46215 
30 .69653 .62707 .57036 .52046 47517 
35 . 72985 .66814 .61742 67252 53152 
40 . 75603 . 70050 65465 61389 , .57651 
45 .77720 72671 .68487 64757 .61326 
50 79471 .74841 .70994 .67655 .64386 
55 .80946 .76669 78108 .69919 .66975 
8 60 .82207 . 78233 74917 71944 .69195 
65 .83300 . 79586 . 76484 . 73699 .71121 
70 84255 80770 «77855 «75236 72808 
75 85099 .81815 -79066 -76593 74299 
80 .85851 . 82746 80144 - 77801 ~ 75628 
i 85 86524 .83579 .81109 78884 -76818 
a 90 .87132 .84330 81980 79861 . T7892 
$ 95 .87683 .85012 -82769 .80746 78866 
100 .88185 .85632 .83487 «81552 79753 
200 93335 91971 90818 89774 88801 
300 . 95267 -94330 . 93537 .92818 92146 
400 .96298 . 95582 .94976 94426 93912 
500 96944 -96350 .95873 95427 .95010 


423 


SANKHYA : THE INDIAN J OURNAL OF STATISTICS : SERIES A 


TABLE 3. VALUES OF yr, FOR WHICH THE UPPER BOUND (3) P(ut<rg) OF 
P(r:<ra) HAS THE VALUE e [OR EQUIVALENTLY, VALUES OF Nra FOR 
WHICH E(N(ra)) =a] FOR THE CASE OF TWO OUTLIERS—(Continued) 

ee ee ee 
a ag 


« = 0.050 


number of dimensions 
k 


sample size 


n 1 2 3 3 s 
5 0.07071 0.00250 0.00000 
6 14938 .03872 00111 
7 .22090 -08601 -02019 0.00060 
8 28207 14167 .05800 .01355 0.00036 
9 -33403 .19419 - 10237 -04245 -00975 
10 .87841 .24188 14702 .07881 08273) 
u 41670 .28462 .18945 11716 06322 7 | 
12 .45006 .32286 ,22884 (15489 .09657 
13 47939 85712 .26504 19086 (18029 ) 
14 50539 38792 29819 22462 16313 
15 52863 41573 32853 .25609 (19451 
16 154952 «44096 -35634 .28532 22417 
17 .56842 “46394 .38188 181244 .25207 
18 158562 48496 “40540 188763 .27823 
19 .60134 50426 42712 36103 -30274 
20 .61578 52205 «44722 88282 .82571 
2 .62908 53849 «46588 40313 «34723 
22 64139 55374 «48324 -42210 «36743 
23 65282 56793 -49944 «43986 38641 
24 66346 58117 51459 45651 40426 
25 67339 59354 .52878 47215 42108 
26 68269 60515 .54211 “48687 43695 
27 69141 61604 55465 50076 45194 
28 69962 .62630 56648 .51386 46612 
29 70735 .63598 57765 152625 47955 
30 11465 64513 58821 153799 49230 
35 74583 68424 63351 58850 -54730 
40 .77032 71501 66926 62850 .59106 
45 79013 73992 69824 66101 .62671 
50 .80652 76052 72224 .68798 65635 
55 .82032 «77187 «74247 “71073 .68138 
60 -83213 9271 «75978 «73021 -70284 
65 -84236 -80555 -77476 74708 72143 
70 -85132 -81678 -78787 76185 73772 
75 85922 .82670 9944 «77489 75210 
80 -86627 -83553 80974 - 78650 76491 
85 87259 84343 81896 79689 pm 
90 .87829 .85056 .82798 80627 P 
95 88346 «85703 -83482 ‘81477 ‘ane 
100 .88817 .86292 ; “aa 
200 93664 ap poa “e 
‘ 92316 91177 D 
300 -95490 .9456 et ig S0 
i dësa ite .93069 92405 


-95160 -94616 -94107 
500 -97080 -96500 -96021 -95580 -95167 
OI 


MULTIVARIATE STATISTICAL OUTLIERS 


TABLE 3. VALUES OF yr, FOR WHICH THE UPPER BOUND (3) P(u2<r,) OF Plregro) 
HAS THE VALUE « [OR EQUIVALENTLY, VALUES OF vra FOR WHICH 
E(N(ra)) =a] FOR THE CASE OF TWO OUTLIERS —(Continued) 


« = 0.100 


number of dimensions 
k 


sample size 
n 1 2 3 4 5 


5 0.10000 0.00501 
6 -18821 04791 0.00223 
7 - 26269 10904 -02872 0.00119 
8 32402 .16955 07368 01927 0.00072 
9 .37493 22444 .12283 -05397 .01386 
10 41780 27305 17039 09468 .04161 
11 45442 31592 .21449 - 13599 -07599 
12 48609 35381 25473 17565 11219 
13 51380 38747 .29126 ,21282 14791 
14 53826 41753 32440 24728 . 18212 
15 .56006 44453 .35452 -27909 21439 
16 57962 46892 39196 30842 24461 
17 59728 49106 40705 33548 27282 
18 -61332 51125 -43006 36047 .29911 
19 62797 52974 45124 38360 -32362 
20 -64140 54676 47079 -40506 84649 
4 k 21 65378 56246 ASSSI 42501 36785 
22 66522 57700 50571 44360 88783 
23 -67584 -59051 -52137 -46095 -40655 
24 68572 -60310 -53598 47720 42412 
25 -69494 61487 -54966 49243 44064 
26 - 70357 62589 .56250 50675 45619 
27 71167 63623 57456 -52023 47086 
28 -71928 -64596 -58592 “53293 48472 
29 - 72646 65513 59664 54494 49784 
30 73323 -66380 -60677 -55631 51026 
35 76216 - 70082 -65015 -60508 «56375 
‘ 40 78489 72990 68431 64361 -60614 
45 - 80328 75343 -71196 -67486 -64061 
50 .81850 177288 -73485 -70074 -66921 
55 -83133 78926 75413 72257 -69335 
60 84231 .80327 77061 74124 71401 
65 85183 .81539 . 78488 75740 73191 
70 .86017 . 82600 . 79736 77155 74758 
75 86754 83537 80837 -78403 76141 
80 87410 84370 81817 -79514 -77372 
85 87999 85118 - 82696 - 80509 -78475 
90 88531 .85791 83488 81407 79470 
95 89014 - 86402 84205 - 82220 - 80372 
y 100 89454 86960 84859 82961 -81193 
200 93994 92664 91538 ` 90518 89565 
300 - 95713 -94799 94025 93322 .92666 
400 . 96635 95936 -95345 -94807 -94305 
500 97215 -96650 .96169 -95733 95326 
425 
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CORRIGENDA 
Introducing Volume Twentyfive : By P. C. Mahalanobis, Sankhya, Series A, 25, 3. 
Footnote 4 should be replaced by the following : 
Annual Review by P. C. Mahalanobis, Indian Statistical Institute Annual Report : 
1960-61, 78-81. 
Address by R. A. Fisher at the First Convocation of the Indian Statistical Institute, 
12 February, 1961. 


(1) Approximate Probability Values for Observed Number of Successes : 
By John E. Walsh, Sankhya, 15, 281-290. 

(2) Definition and Use of Generalized Percentage Points : By John E. 
Walsh, Sankhya, 21, 281-288. 

A quantity C,(x,, ta), or Cd), ne), is defined and used in the papers (1) and (2), also 
in Handbook of Nonparametric Statistics : Investigation of Randomness, Moments, Per- 
centiles, and Distributions, D. Van Nostrand Co., 189-190, as Cilsj, 82). The expression for 
Cylu,, ta) is accurate for v = 2 but inaccurate for v > 3. A correct expression is 

z pa fo) rës : 
> (EA ( ) x Pn—o(t—j), 
J=0 J az, 


where pple) = ta Deia for v— 01, e e and is zero otherwise. The author of (1) 


and (2) is indebted to Frederic M. Lord for calling his attention to this discrepancy and for 
furnishing the correct expression. Also, as noted in (2), 30°C,(a,, xa) should be replaced by 


FPOe t'a) in the expansion on page 283 of (1). 


ADDENDA 


Statistics Proposed for Various Tests of Hypotheses and their Distribu- 
tions in Particular Cases: By Q. P. Bagai, Sankhya, Series A, 24, 409-418, 


The results (4.3) and (4.4) can, respectively, be put in a standard form of the genera- 
lized Gauss’ Hypergeometric series as follows : 


I = 8a*[(1—2y) —log 4a] oF,(; 3 ; 4a) 


titer oar atta tart) +L ty La ) 
AAt HHH pt) 
ma Mou (ët) Sotirit) 
E EFE si). 
anG E) 
—2(y-+log a) 5; ee z) 


where, in both, y is an Euler constant and the symbol „F, is defined as follows : 


1 a 1 a 
pie fat Pa to) = 1--—— £ 
oP gli Fi: Far fja) = At Tiley ACES TACs Caray apt 


Similar change may be made in (4.16). 
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Distribution of the Determinant of the Sum of Products Matrix in the 


Noncentral Linear Case for Some Value of p : By O. P. Bagai, Sankhya, Series A, 24, 
55-62. 


The results (2.5), (2.8), (2.9), and (2.10) can, further, be put in a standard form of the 
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oF, is defined as follows : 
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